home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-05-29 | 175.4 KB | 3,545 lines | [TEXT/KEEN] |
- ********************* hAWK User’s Manual *********************
- Copyright © 1991 the Free Software Foundation, Inc. You can redistribute or modify
- this file under the terms of the GNU General Public License as published by
- the Free Software Foundation (see the file “COPYING hAWK”).
- font: Geneva 10. Four spaces per tab.
-
- hAWK is NOT a stand–alone application: it must be called by some other application.
- Interaction between hAWK and the calling application will vary according to how
- well the calling application supports text documents. However, virtually any
- (C-based) application can add the ability to call hAWK. For details, see
- “Calling hAWK from your application” near the end of this manual.
-
- Applications which support calling hAWK (add yours to the list!):
- Minimal App (included, with source code)
- EnterAct 2, RFEdit
-
- You can read this document with any programmer’s editor (you may not see the 4
- pictures - they’re not that critical). You’ll need an editor to view the results of a
- program run if you use Minimal App to call hAWK, since Minimal App does not do
- anything with text files, and you’ll find that Minimal App, with its minimal level of
- support, has limited program input options. In fact, calling hAWK through Minimal
- App shows what hAWK would look like if it were repackaged as a stand–alone
- application. See “Calling hAWK through Minimal App” (in the “Advanced topics”
- chapter) for tips on using Minimal App with an editor to run hAWK programs.
-
- Major topics are marked with MPW-compatible marks, available in many editors by
- holding down the <Option> or <Command> key while clicking in the window’s title bar.
- You can jump to a section heading by selecting the heading in the table of contents and
- using the editor’s “Enter Selection”/“Find Again” commands. The “Active index” at
- the end of this manual is suitable for on-line use, consisting of line numbers rather
- than page numbers; to jump to the line for a reference in the index, select the
- corresponding line number and use the editor’s “Go to” command.
-
- If you change the content of this manual you will throw off the Active index, and will
- lose the marker locations also if the editor doesn’t manage MPW–compatible marks.
- However, feel free to add or delete markers, or change the font.
-
- Why bother to learn hAWK?
- • Many editing and formatting problems that crop up in the life of a C programmer
- can be solved with a simple hAWK program. Now you have a choice—grind out a
- series of mechanically–repeated key strokes, or dash off an elegant little program.
- And when it comes time to solve a problem, a typical hAWK program can be run
- with two mouse picks and a press of the <Return> key.
- • On the Mac alone, there are versions of AWK that run under the MPW shell, under
- A/UX, and now with hAWK there is a version that’s handy to use in conjunction with
- THINK C. Never mind all the DOS and Unix implementations—even on the Mac, hAWK
- is a widely–used language. You’re not learning a white elephant, here.
- • Need to prototype a “little” language? Try out an algorithm? Looking for an
- introduction to C that comes with air bags? This is it. For a sampling of what hAWK
- can do, see “About the supplied programs” below.
-
- Contents
- -----------
- Introduction
- Installing hAWK
- Where to go from here
- About hAWK
- From AWK to gAWK to hAWK
- What’s missing
- What’s new
- The calling application
- A typical hAWK run
- Running hAWK programs
- The setup dialog
- Concurrent and immediate modes
- Selecting your program
- Selecting input for a program
- Setting variables
- Library files
- Showing the results
- Saving the setup for a program
- Cancelling a run
- Standard input and output
- About the supplied programs
- hAWK program structure
- From start to finish
- Grouping and breaking lines
- The command line and ARGV[]
- Variables and constants
- Variable names and types
- Constants
- Record and field variables
- Built–in variables
- Local variables in functions
- Setting variables on the command line
- Conversion between numbers and strings
- Arrays
- Patterns
- Patterns and actions
- BEGIN and END
- Expressions as patterns
- String-matching patterns
- Regular expressions
- Compound patterns
- Range patterns
- Summary of patterns
- Actions
- Introduction
- A preview of “print’
- Expression operators
- Built–in numeric functions
- Built–in string and file functions
- Control-flow statements
- Empty statements
- User-defined functions
- Output
- The “print” statement
- The “printf” statement
- Output into files
- Closing files
- Input
- FS, the input field separator
- RS, the input record separator
- The “getline” function
- The “hAWK” function
- Advanced topics
- Other ways of specifying input files
- Beyond input records
- Calling hAWK through Minimal App
- Calling hAWK from your application
- What and how
- Getting started
- Add two calls in your code
- A minimal version
- Callbacks, and showing results
- Modifying hAWK
- Introduction
- hAWK THINK C project
- Source
- Libraries
- Active index
-
-
- -------------
- Introduction
- -------------
- hAWK is AWK adapted for the Macintosh, a small programming language which is
- well-suited to jobs involving text manipulation and pattern recognition. hAWK
- is not a stand-alone application, but is rather a CODE resource with a specific simple
- calling interface (called a "Drag_on Module"), and it is invoked by selecting "hAWK"
- from a menu in an application that can call Drag_on Modules.
-
- This manual will explain in more detail what hAWK is, and show you how to run hAWK
- programs. There are many useful programs suppled in the "hAWK programs" folder,
- each with complete instructions at the top so you can try them out as you go along; they
- range from very simple to rather complex, general purpose to very special purpose,
- and illustrate the wide range of hAWK’s abilities, from counting lines in a file to
- cross–referencing your C source. The chapter below entitled “About the supplied
- programs” provides an overview of the programs in the “hAWK programs” folder.
- These programs are not just useful as “examples to learn from”—they are, for the
- most part, nontrivial, and supply real answers to the daily problems of a C
- programmer.
-
- What is hAWK really? hAWK is what C could be if you weren't in a hurry. hAWK
- programs are relatively small, look rather like C code, and rely on powerful built-in
- capabilities and commands—capabilities like automatic reading of input files on a
- line-by-line basis, commands such as "gsub" which is, just on its own, as powerful
- as Grep. The focus is on text, but the text can be just about anything—the sample
- program “$Print_MENU_Resource”, for example, will take the hex representation
- of a MENU resource as retrieved by Read Resource and format it to be human–readable.
-
- The primary difference between hAWK and other versions of AWK lies in the method of
- running programs; hAWK’s setup dialog allows you to run programs with just a few
- mouse clicks, with typing needed only if you wish to assign initial values to variables
- before a run. This is mainly because hAWK can take advantage of the window and file
- handling abilities of the application that is used to call it, to offer the options of taking
- input for the hAWK program from text in the front window of the calling application,
- or from the list of files selected for multi–file operations. These generalised input
- specifications, “whatever’s in the front window” and “whatever’s selected for
- multi–file operations”, eliminate the need to type in a list of file names for a program
- to use as input. And since each program can remember the general input method you
- have selected for it, repeated runs of a program are reduced to: bringing the input to
- hAWK’s attention, either by bringing a text file to the front or by selecting files for
- multi–file searching; and then running the program with three mouse clicks. This all
- makes hAWK as easy to run as a macro language, and since AWK is a widely–used,
- full–featured programming language you should find it well worth the effort of
- learning.
-
- ---------------
- Installing hAWK
- ---------------
- If you can read this, then you’ve installed hAWK, since it is being shipped in
- compressed form these days. As a reminder, hAWK should be inside your
- "Drag_on Modules" folder, and this folder should be in the same folder that
- contains the calling application, at the same level. The "hAWK programs" folder
- should also be in the "Drag_on Modules" folder, and this manual can go anywhere.
-
- To verify that hAWK has been installed, start up an application that can call hAWK
- and then check the menus; you should see “hAWK” as one of the items. Select “hAWK”,
- and the setup dialog for hAWK will appear. Venture on ahead fearlessly if you like,
- armed with the magic talisman that holding down the <Command> key while typing
- a <period> will interrupt any running hAWK program.
-
- ------------------
- Where to go from here
- ------------------
- Read straight ahead here until you’ve tried out a few hAWK programs and are comfortable
- with the overall approach to running them. The supplied programs in the “hAWK
- programs” folder are worth exploring to get a feel for what hAWK can do—and you’ll
- likely find that several of them provide answers to problems little or big that you
- regularly face. The remainder of this manual delves into the inner workings of hAWK,
- necessary reading if you want to write your own hAWK programs (and who could
- resist?). If you make use of the markers in this manual for the chapter and
- section headings, and the active index at the end listing topics, you’ll be able to browse
- around almost as easily as with a printed book.
-
- This is a good–sized manual, and if you try to read straight through it at one sitting
- you’ll probably hurt your head. Just amble along at a gentle pace, and when ideas or
- questions pop up, you’ll find it well worth the effort if you take a moment to write
- a one or two–line hAWK program to try the notion out. Running a hAWK program
- takes just a few mouse clicks. The easiest way is with “$RunClip” (see chapter H).
-
- You can, if you wish, print this manual yourself. Aha, but what about that index, which
- lists line numbers rather than page numbers? Thought you might ask that—what you
- want, then, is a version of this manual with line numbers added at the beginning of each
- line. An ideal job for hAWK!
- 1 Use a “Save As” command to save this manual under a different name, such
- as “hAWK Manual” (or save it under the same name but in a different folder):
- 2 Select “hAWK” from the calling application’s menu, and the setup dialog will appear;
- select “$AddLineNumbers” from the “Main program:” popup menu at the top; pick the
- “Select input file” option from the “Take input from:” popup, and use the standard
- Open dialog that appears to select the copy of this manual that you just created:
- 2a Click “Run” and wait a bit....and you’re back in the calling application:
- 3 Open the copy of this manual—if you left it on–screen while running hAWK,
- choose “Revert” to see the changed version (you can force Revert to be enabled by
- typing one character in the window):
- 4 Print the result —change the font first, if you like.
- 5 Note, to include the pictures you will have to use ResEdit to copy them from the
- original manual to your copy of the manual, and use EnterAct to print. They deal with
- the setup dialog only, and you shouldn’t miss them much if you don’t bother.
-
-
- A very readable description of AWK (excluding the Macintosh variations of hAWK) can
- be found in
- "The AWK Programming Language" ,
- Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
- Addison-Wesley, 1988. ISBN 0-201-07981-X.
- on the "Languages" or "unix" wall of your favourite bookstore.
-
- A more relaxed, though less ambitious, introduction can be found in
- "sed & awk"
- Dale Dougherty
- O’Reilly & Associates, Inc., 1991. ISBN 0-937175-59-5.
- The coverage of regular expressions is especially sympathetic.
-
- ----------
- About hAWK
- ----------
- From AWK to gAWK to hAWK
- hAWK is a Macintosh version of AWK, a pattern-recognition and data-manipulation
- language that is popular on unix systems. This version of hAWK is a modification of
- GAWK, the GNU Project's implementation of the AWK programming language, which
- differs in only minor ways from "classic" AWK. "hAWK" will be the name used
- below, except where differences from Gawk or AWK need pointing out.
-
- AWK has a venerable history, going all the way back to 1977 when messrs Aho,
- Weinberger, and Kernighan developed it at Bell Labs to fill in some small holes
- in Unix. The idea then was to write one or two–line programs to solve simple
- pattern–matching and text or number transforming problems—programs so small
- that you wouldn’t even bother to save them, just type them in on the fly, right on
- the command line. Over the years, users have pushed the limits of AWK, and many
- features have been added (user–definable functions being the nicest), and now
- multiple–page AWK programs are commonplace.
-
- GAWK is a Unix/IBM version of AWK, developed around 1986 by Paul Rubin and Jay
- Fenlason and copyright by the Free Software Foundation. It adds some useful
- enhancements to AWK, dealing mostly with files and variables.
-
- hAWK is essentially GAWK adjusted for the Macintosh, with the addition of a dialog
- interface to take advantage of windows and mice. If you wish to distribute hAWK, by
- the way, you should note that it is governed by the Free Software Foundation’s
- copyright restrictions (not too horrible) which you can find in the file “COPYING
- hAWK” in the source code folder for hAWK.
-
- What’s missing
- Pipes are missing. Pipes take a full–fledged shell to run, and most applications aren’t
- up to it. Since hAWK is packaged as a CODE resource to be called by any old application,
- pipes had to go. Similarly, the “system” command (which allows one to call other shell
- commands from within an AWK program) has been dropped.
-
- What’s new
- The interface is new. No more command line—most hAWK programs can be run with
- just a few mouse clicks, and typing is needed only if you want to set the value of
- variables before running the program.
-
- There are seven new built–in string functions, “lookup”, “sort”, “time”, “prompt”,
- “progress”, “getclip”, and “putclip”, described in “Built–in string and file functions” in the
- “Actions” chapter. Some new file and directory functions are also described there.
-
- The “lookup” function returns the type of a C term as an integer code (#define = 1,
- variable = 2, etc), useful when doing cross-referencing. It relies on the calling
- application for this diagnosis, so hAWK programs that use “lookup” should be called
- only through applications which support it (Minimal App doesn’t).
-
- The “sort” function is provided to (mostly) make up for the lack of a shell sorting
- function. It’s fast, and can do ASCII, numeric, or dictionary–order sorting of an array,
- in forward or reverse order.
-
- The “time” function produces the current date and time, to the second.
-
- The “prompt” function prompts you with a dialog to enter some text, and returns what
- you enter as a string, as in
- X = prompt("Please enter a value for X:")
-
- The “progress” function allows you to show (and update) a message while a program is
- running.
-
- The “getclip” function returns a string holding the calling application’s current (up to
- the second) private clipboard. This can be used to pass instructions or data to a hAWK
- function while it is running concurrently with your application (more on this,
- needless to say, below). Similarly, putclip puts a new string of text on the clip.
-
- As a partial replacement for the “system” command, any hAWK program can call any
- other hAWK program as a “subroutine”, via the “hAWK()” function. Using this
- function, a program can generate a special-purpose program and immediately
- execute it (eg $MFS_SuperReplace), or selectively execute a series of programs (eg
- $Chain). It also allows you to type in and run programs without saving them first (eg
- $RunClip). This function is decribed in its own chapter, “The hAWK function”.
-
- Three built–in variables have been added; RUNERR, STDPATH, and TIME. See “Built-in
- variables” in the “Variables and constants” chapter for details.
-
- hAWK uses the concept of standard input, output, and error, but strictly in the
- form of files with the fixed names $tempStdIn, $tempStdOut, and $tempStdErr.
- These files are created and written to as needed, and can be found in the same
- folder that contains your “Drag_on Modules” folder after you’ ve begun running
- hAWK programs. These are temporary files, and will normally be overwritten
- by each hAWK program run.
-
- The regular expressions implemented in hAWK are full regular expressions, with the
- ability to tag subexpressions, match word boundaries, ignore case, and deal with
- multi–line strings. Just about anywhere else in this world, you’ll find either full
- regular expressions or the ability to tag subexpressions, but not both. One minute you
- want the “or” operator, the next minute you want to tag something—it gets rather
- frustrating. There is absolutely no good reason not to allow both together, so in hAWK
- you’ve got them. Speaking of gripes, most Grep’s will limit you to a single line—that’s
- not just frustrating, it’s downright crippling. (By the way, another major
- improvement over Grep is that in AWK/hAWK your regular expression can be the
- string resulting from the evaluation of one or more variables, eg
- if (no_plus_or_minus)
- integer_pattern = digits; # digits == "[0123456789]+"
- else
- integer_pattern = plus_or_minus digits; # plus_or_minus == "[+-]?"
- —and a pleasant side–effect is that regular expressions can be very readable if you want.)
- For the details, see “Regular expressions” in the “Patterns” chapter.
-
- If the calling application supports the notion, your hAWK programs will by default
- run concurrently with your calling app. This means you start up the hAWK program,
- and then go back to working in your application (or background it and work somewhere
- else) until the hAWK program is done. The “prompt” and “progress” functions are
- non-functional in this concurrent mode, so you can run programs in the “immediate”
- mode, which supports “prompt” and “progress” by holding down the
- <Shift> key while selecting “hAWK” from the calling application’s menu. In
- immediate mode, you will be locked out of the calling application until the hAWK
- program ends. Programs will run more slowly in concurrent mode (the speed
- drop being slightly greater if you put the calling application in the background),
- but this is usually more than compensated for by being able to carry on with other
- things, rather than just sit there watching the watch cursor. The running hAWK program
- usually doesn’t affect application performance very much. For more about this,
- see “Concurrent and immediate modes” in the “Running hAWK programs” chapter.
-
- The calling application
- Any C-based application can call hAWK and other Drag_on Modules, as the source
- code for Minimal App demonstrates. The level of interaction between hAWK and
- the calling application is up to the author of the calling application, and can vary
- more or less according to the following table:
-
- Level Support for interactive features
- ---- -----------------------
- minimal (none; no result showing, input options limited to one specific file)
- basic text pass front text window as input option, show stdout after a run
- full text basic text, and pass list of selected files as input option
- full full text, diagnose the type of a C code term, pass the clipboard
-
- If the application you are using provides only minimal support, then some extra
- manual steps are needed to persuade a hAWK program to take input from the current
- front text file or a list of files, and to view the results of a run; see “Calling hAWK
- through Minimal App” in the “Advanced topics” chapter for some tips on this. The
- discussion there is “advanced” only if you want to understand all the details—you
- can use the methods described there by rote (for example, if it says paste this bit
- of code into the top of a program and you’ll have support for taking input from a list
- of files, you can do it now and worry about how it works later).
-
- ------------------
- A typical hAWK run
- ------------------
- Have you installed hAWK yet? If not, now would be a good time (see above).
-
- We’ll assume that you’re calling hAWK through an application that supports passing all
- or part of the front text window as input options, and showing stdout after a run, to
- make life simpler. If you don’t have such an application, you can use Minimal App in
- conjunction with whatever editor you are using to view this file, as described in the
- “Advanced topics” section “Calling hAWK through Minimal App”.
-
- One of the programs supplied with hAWK is “$EnumSwitch”, which takes a list of
- enum constants and generates a “switch” statement based on them. It’s contained in the
- folder “hAWK programs”, which is inside the “Drag_on Modules” folder—you might
- like to take a look at it first....
-
- OK, here we go: first, move this window on your screen so that you can see the next few
- lines while the hAWK setup dialog is in front (select hAWK now from the appropriate
- menu and Cancel to see where it appears). Now select the following line of text:
- {first, second, third, fourth, twilightZone = -99}
- -is it highlighted? Good. Now, select hAWK from the menu; when the dialog
- appears, select “$Enumswitch” from the top popup menu called “Main program:”, and
- finally, click on the Run button or hit <Return> on the keyboard.
-
- You should be back in the calling application now, with a switch statement coming up in a
- window called “$tempStdOut”. hAWK took the line that you highlighted above, stripped
- it down, built a switch statement out of the words, and wrote the results to the disk file
- “$tempStdOut”. The calling application is now showing you the resulting file, with
- contents selected and ready for pasting into your source code.
-
- Most hAWK programs can be run this easily. Now, the full story.
-
- ------------------
- Running hAWK programs
- ------------------
- The setup dialog
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- When you select hAWK, the above “setup” dialog always appears first. A typical
- program run consists of: setting up the input to be ready for hAWK; selecting hAWK to
- see the setup dialog; selecting the program to run from the “Main program” popup
- menu; and hitting the Run button. If you have variables in the program that need
- to be set just before running the program, then you can set up to 10 variables by
- using the dialog that appears when you click the “Set variables” button. The input
- option, variable settings, and names of any associated libraries can all be saved with
- a hAWK program via the “Save settings” button, so that when you run a program again
- you‘ll need to adjust the setup only for things that have changed (typically only the
- values to be initially assigned to variables, if anything).
-
- Concurrent and immediate modes
- With most little languages, when you run a program that’s all you do—run the
- program. No continuing to work in your primary application, let alone switching to
- another application. In the rare case when you want hAWK to completely take over your
- Macintosh, locking you out of the calling application, hold down the <Shift> or <Option>
- key while selecting “hAWK” from the calling application’s menus. If the program uses
- the “prompt” or “progress” functions, it will be necessary to run in this
- “immediate” mode, since they just return null results in the “concurrent” mode.
-
- In all other cases, just select “hAWK” from the calling application’s menus
- without holding down the <Shift> or <Option> key, and if the calling application
- supports it, you’ll be returned almost immediately to your application, able
- to carry on working there while the hAWK program runs at the
- same time. This “concurrent” mode of running programs does not greatly
- slow down the calling application or any other application that you switch to.
- The hAWK program itself will run more slowly than in immediate mode, often
- taking about 50% longer—but if you don’t need the results in a huge rush,
- stick to the concurrent mode and just forget about the hAWK program until
- it winds up with a beep.
-
- While a hAWK program is running concurrently, you won’t by able to run any
- additional Drag_on Modules. This is because they all use the same standard output
- file ($tempStdOut), and a fight could develop over who gets to write to it.
-
- While a hAWK program is running concurrently, you will not be able to save to
- any files that hAWK is using. Regular input files are accessed only one at a time,
- and the standard input/output/error files will normally be “busy” from
- beginning to end of the run. In addition, any files being read from or written to
- via redirection (see “Output” and “Input” chapters) will not be writeable.
- However, you will be able to open any file that hAWK is using to take a look
- at it. With a lengthy program, you can check in with hAWK now and then by
- opening (or reverting) $tempStdOut to get a snapshot of how things are
- progressing.
-
- See the supplied program “$LogDaemon” for an example of a hAWK program which
- idles unobtrusively underneath your calling application, waiting to take special
- action when you copy a specific instruction to the application’s clipboard. A
- “daemon”, by the way, is an invisible, powerful spirit with your best interests
- at heart. It “possesses” your Macintosh, in a nice way. And the name is a bit more
- entertaining than the plain old “forks” and “threads” etc.
-
- Concurrent execution is currently supported by: EnterAct.
-
- Selecting your program
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- The “Main program:” popup at the top of the setup dialog lists all text files in the
- “hAWK programs” folder whose names begin with a dollar sign ($). This list is
- rebuilt each time you call hAWK. If a program is not listed in the popup, you can still
- run it by picking “Select unlisted program”, the first item in the “Main program”
- popup, and then using the standard Open dialog that appears to select the program—note
- it could be in another folder, or in the “hAWK programs” folder but not shown in the
- popup simply because its name doesn’t start with a “$”. You can avoid clutter in this
- popup by starting the names of only your most popular hAWK programs with a “$”, so
- that other less–frequently used programs won’t be shown in the popup—if they are in
- the hAWK programs folder, they will still be close at hand.
-
- Selecting input for a program
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- This is one of hAWK’s nicest features, allowing hAWK to interact with the calling
- application to provide quick input file specification. Two additional ways of specifying
- input files, not listed in the “Take input from” popup, are described in the
- “Advanced topics” chapter, in “Other ways of specifying input files”.
-
- Under the “Take input from” popup menu, the options “Front text selection” and
- “All of front text” refer to the text window that happens to be in front just before you
- call hAWK from the calling app’s menu. According to what you select here, all or just the
- selected part of the text in the front window will be written to a temporary file called
- “$tempStdIn”, and passed to your program as the input file to use. If your program
- is to be run using one of these options, bring the text window containing the text to be used
- as input to the front just before calling hAWK, and if you’ll be using the “Front text
- selection” option, you should select the text as well. For an example, see
- “A typical hAWK run” above, where this manual itself served as the front text.
-
- The “MFS selected files” option in the “Take input from” popup refers to a list of files
- selected in the calling application for multi–file operations (typically this list is used
- mainly for multi–file searching in the calling application, and you construct it by
- placing check marks or bullets • beside file names—see the calling app’s manual for
- details). With this option selected, all files selected for multi–file operations will be
- passed to the hAWK program as input. This means you can set up a list of files in the
- calling app, and then have your hAWK program take its input from those files, from
- one file to hundreds. One limitation of this approach is that you can’t specify the exact
- sequence in which the files will be dealt with. With many programs, this is not a
- problem (multi–file search and replace, for example). To treat input files in a specific
- order see “Other ways of specifying input files” in the “Advanced topics” chapter.
-
- The “Select input file…” option allows you to use a standard Open dialog to pick one
- specific file to use as input for a hAWK program. As with all other aspects of the
- setup dialog, if you click “Save settings” the name of the file you select will be saved
- with the program itself, and restored for the next run.
-
- Aside from “Select input file…”, input options will not be shown if they are not
- currently available.
-
- In rare cases, you may need no input at all for your program. To ensure that no
- input is passed, pick the “Select input file…” input option, cancel the Open dialog
- that appears, and then click the “Save settings” button. The input option for your
- program will thereafter read “Select input file…”, as though imploring you to
- pick one, but no input will be sent to your program. It’s harmless if input is sent
- to a program that doesn’t want any, the only penalty being time lost if a massive
- amount of input is accidentally ordered along for the ride.
-
- Setting variables
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- The “Set variables” button allows you to preset the values of variables just before
- running a progam, without having to edit the program itself. As you can see from the
- picture, it’s a simple matter of typing the variable name, followed by an “equals”
- sign, followed by the value of the variable, either a number or a string.
-
- Quotes should not be used to surround strings; just enter the string itself. Any
- spaces between the “=” and the value will count as part of the value, so normally
- you should enter the value with no spaces between the equals sign and the value.
- For example,
- find =spot
- and
- find = spot
- produce different results. Spaces are optional between the name of the variable
- and the equals sign.
-
- The limit on the length of the variable assignment, including the name of the
- variable, is 100 characters. Up to 10 variables may be given values this way.
-
- Special characters such as tabs and returns can be placed in a string by using the
- standard escape sequences familiar from C, eg
- find =\tspot\n
- assigns to “find” the string consisting of a tab, followed by s-p-o-t, followed
- by a carriage return.
-
- You can also assign the value of a (dynamic) regular expression using the “Set
- variables” dialog, for example
- find =\.#[A-Za-z_]+ (never mind what it means for now)
- —note there is no need to enclose it in forward slashes, and many characters must
- be escaped with a backslash if you want them matched literally (the section
- “Regular expressions” in the “Patterns” chapter explains the nuances).
-
- Clicking the “Save settings” button will save your variable assignments for
- subseqent runs. Hence you’ll need to use the “Set variables” dialog only
- when the preset value of some variable changes.
-
- If variable presets exist for a program then the “Set Variables” button will
- acquire a gray outline as a reminder that some variables may need changing
- before running the program. With some programs (such as $CompareFiles) you’ll
- almost never change the preset variables, but with others (such as $MFS_SuperLister)
- you’ll want to change one or more variables before almost every run.
-
- Library files
- Technically, this is an advanced topic, but it’s simple to use. If you develop
- some general–purpose functions, such as sorting routines, that you wish to
- use in several programs without duplicating the function definitions within
- each program, you can save the functions in a separate file and add that file
- to each main program as a library. The contents of the library file are
- simply appended to the contents of the main program before running it, so the
- library can in fact contain any valid hAWK statements. However, to preserve
- sanity, libraries should be restricted to just functions.
-
- To add a library file to a main program:
- 1 Use the “Main program” popup to select the program
- 2 Use the “Select library…” item in the “Libraries:” popup to add the library
- by using the standard Open dialog that appears.
- 3 Clicking the “Save settings” button will preserve your selection of libraries
- for for subsequent runs.
-
- To delete a library, select it using the “Libraries:” popup.
-
- One sample library is included, in the file “SortLibrary”. It is not used in the
- sample programs, it’s just an example (PLEASE NOTE hAWK has its own
- built–in sort function, which is very fast). Little is lost if you follow the
- policy of not using libraries—programs are easier to read if all the code is in one place.
-
- Showing the results
- Output from hAWK programs is produced by “print” or “printf” statements,
- which send their output to the file “$tempStdOut” unless you explicitly
- redirect it. For example,
- print "some text"
- will print the string "some text" to $tempStdOut.
-
- The file $tempStdOut is created and managed for you, and most hAWK programs will
- send at least some output to this file. If you would like to see this file after the program
- is finished, put a check mark in the “Show stdout” checkbox in the setup dialog just
- before running the program. When the program is done, the calling application will
- then show you the $tempStdOut file in a window, if it is able to. If the calling
- application doesn’t support showing stdout, you’ll have to manually Open or Revert the
- $tempStdOut file using your editor (for more on this see “Calling hAWK through
- Minimal App” in the “Advanced topics” chapter).
-
- Place a check mark in the “Select all of stdout” check box to have all of the output
- in the $tempStdOut window selected at the end of the program run. This is handy
- if you’ll be wanting to copy the entire output and paste it in elsewhere.
-
- Saving the setup for a program
- The “Save settings” button saves away your selection of options for a program, so that
- they will be restored for subsequent runs of the program. These options are saved with
- the program itself, in a special resource. The saved options are:
- 1 The names of any libraries associated with the program
- 2 Names and values of any preset variables
- 3 Your choice of input option, including the input file name if you have used
- the “Select input file…” option to pick a specific file.
- 4 Your output options, in the checkboxes “Show stdout” and “Select all of stdout”.
-
- During the first run of a program that you have written, you should set up the options
- you want and then click the “Save settings” button. Subsequent runs will then consist
- of just these steps:
- 1 Select “hAWK” from the calling application’s menu
- 2 Use the “Main program” popup to select the program
- 3 Use the “Set variables” button if needed to put in new values for variables
- (many hAWK programs don’t need this)
- 4 Click the Run button.
-
- Occasionally, you may want to run a program using a different input option, for example
- run it using “MFS selected files” rather than “All of front text”. This is simply a
- matter of selecting the new input option from the “Take input from” popup just before
- running the program. If you want the input option to be permanently changed for the
- program, click the “Save settings” button after picking the new input option.
-
- Cancelling a run
- To cancel a hAWK program, hold down the <Command> key while typing a <period>.
- Program execution should cease within one second.
-
- ------------------
- Standard input and output
- ------------------
- Drag_on Modules such as hAWK and Read Resource use three disk files to communicate
- with you and with the calling application. These text files carry the burden of standard
- input/output for Drag_on Modules. If a Drag_on Module requires a large chunk of input
- that is not already in an appropriate disk file, the input will be written to the standard
- input file “$tempStdIn”, and all normal output from Drag_on Modules is, unless you
- specify otherwise, sent to the file “$tempStdOut”. If errors pop up while the Drag_on
- Module is running, error messages will be written to the file “$tempStdErr”. These
- files are all created and written to automatically as needed, and can be found in the same
- folder that contains your “Drag_on Modules” folder.
-
- The file of main interest here is $tempStdOut, which typically holds the results of a
- Drag_on Module run. Drag_on Modules don’t show you this file, but can request that the
- calling application show it to you. This is always the case with Read Resource, and is
- optional with hAWK—it depends on whether you put a check in the “Show stdout”
- checkbox in the setup dialog. All of the supplied hAWK programs that write output to
- $tempStdOut have saved settings that include putting a check in this box.
-
- Because the results of Drag_on Module runs are by default written to a fixed text
- file, you can easily pass the output from one run to the input of another run. For
- example, Read Resource creates a formatted text version of a resource and writes
- the results to $tempStdOut, which is then shown to you by the calling application.
- You can then call a hAWK program to further process this output, by leaving
- the $tempStdOut window in front and having the hAWK program take its input
- from the front window (pick the “All of front text” option from the “Take input
- from:” popup menu). And you can pass the output from one hAWK program to
- the input of another in the same way.
-
- A Drag_on Module can only request that the calling application show you the
- $tempStdOut file, but whether or not it does so is up to the author of the calling
- application. If it doesn’t, you’ll have to Open or Revert $tempStdOut yourself
- in order to see the results.
-
- The contents of $tempStdOut are indeed temporary, and will be overwritten by the
- next hAWK program, or indeed any other Drag_on Module, that you run. If you want
- a permanent copy of the output from a program, use “Save As” to save $tempStdOut
- under a new name, or copy the contents to a working window.
-
- hAWK always takes input from a file, and if you are using one of the “front text”
- options for input then hAWK will write a copy of the front text to $tempStdIn before
- running your program. Output from hAWK programs, which is generated by “print”
- and “printf” statements, can be explicitly redirected to any file, but if no redirection
- is provided then by default the output from the program is sent to $tempStdOut. The
- file “$tempstdErr” will hold error messages if problems pop up while running a
- program.
-
- Sometimes you’ll want to take input directly from the file $tempStdOut, without
- bothering to use the above method of opening the file and bringing its window to
- the front. It is perfectly OK to select $tempStdOut as the input file using the
- “Select input file...” option under the “Take input from:” popup. The contents
- of $tempStdOut just BEFORE the run will be used as the input, and input from
- this “old” version of $tempStdOut will not be affected by anything you write
- to $tempStdOut during the execution of your program. Actually, your old
- $tempStdOut will be renamed to $tempOutAsInput just before the run, and
- the file name your program receives will also be changed. This bit of suberfuge
- is necessary since it is not possible to randomly read and write the same file
- without things getting horribly confused.
-
- ------------------------
- About the supplied programs
- ------------------------
- For the most part, the programs you’ll find in the “hAWK programs” folder do useful
- things (from the point of view of a C programmer), with just a few of them being of the
- traditional “completely useless but illustrating some basic point” kind that are often
- foisted on innocent customers by authors who have run out of steam before writing the
- manual. There are nearly as many categories of supplied programs as there are
- supplied programs, so the following list with brief descriptions is in simple
- alphabetical order. The descriptions are brief here because each supplied program
- contains a detailed explanation of what it does and how to use it, at the top.
-
- “$RunClip” provides a handy way to run small programs as you explore hAWK,
- without having to save them to disk first. You’ll find instructions below, and at
- the top of the $RunClip file.
-
- Unless otherwise mentioned, a program sends its output to the file $tempStdOut, and
- you will be shown the contents of this file by the calling application at the end of the
- run (if it is able to do so). Most programs will accept input from any source, but then
- again most programs are especially useful with just one or two input sources.
- $EnumSwitch, for example, expects a comma–separated list of enum constants as
- input, normally provided by selecting the enum constants in a source code window and
- taking input for $EnumSwitch from the selected front text. Running this program on a
- batch of MFS selected files is possible, but wouldn’t produce very useful results. Once
- you understand roughly what a program does, you should be able to judge what sorts of
- input are appropriate for it.
-
- The detailed instructions for running a program can be found at the top of the listing
- for the program itself, and you should read through those before running a program
- for the first time. For example, with $MFSLister you have to tell it what string
- to search for, and this is done by setting a variable with the
- “Set variables” button.
-
- Programs which make essential use of the “progress” or “prompt” functions
- should be run in “immediate” mode (see “Running hAWK programs”,
- section “Concurrent and immediate modes”). To run a program in immediate
- mode, hold down the <Shift> or <Option> key while selecting “hAWK” from
- your application’s menus. Programs that should be run in immediate mode
- are marked with (IMM) just after the program name below.
-
- $AddLineNumbers: will add line numbers to a file. Takes input from one specific
- file, and overwrites the contents of the file. Doesn’t number blank lines.
- $Chain (IMM): allows you to run one or more small canned programs on your input,
- the first program being executed using whatever input you specify, and
- the following programs if any taking their input from stdout. You type
- in the names of the programs to run in a dialog box, and they are executed
- from left to right in the order you typed them. Effectively serves as a
- “library” of small tasks. Illustrates using the hAWK() function to execute
- a sequence of programs, repeatedly taking input from stdout, and the
- “prompt” dialog box.
- $Comments: extracts lines that contain C comments. Or rather, at least all
- lines that contain comments.
- $CompareFiles: prints differences between two versions of a file; for use with
- the “MFS selected files” option. Has a couple of options, but should almost
- always work fine with the defaults—see instructions if results seem suspicious.
- Lengthy miscompares (over 100 lines) will cause it to bog down.
- Demonstrates doing everything with functions rather than pattern–action blocks.
- $DefineSwitch: generates a “switch” statement, with cases created from a list
- of #defined constants. Normally takes input from the selection in your front
- text window, output is shown selected in $tempStdOut for copying to your
- working window.
- $EchoFileNames: for use with the “MFS selected files” option, creates a list of
- the file names that were selected.
- $EchoFullPathNames: like $EchoFileNames, but generates full path names in
- the general form “Disk:folder:folder1:...:folderN:filename”. Full path names
- are required when redirecting input and output of hAWK programs.
- $EnumSwitch: like $DefineSwitch, but generates the cases for the switch from
- a comma–separated list of words, typically enum constants. Initializations
- for any of the constants are ignored.
- $ExtractExternRefs: list all C declarations encountered that begin with “extern”.
- Fast and simple, but will stumble if it encounters “extern” as the first word
- in a comment. (Excercise: steal the comment–skipping code from $XRef
- to fix this little problem).
- $FilesInOrderTest: discussed in the “Advanced topics” chapter way down below.
- Demonstrates the technique of taking input from an arbitrary
- list of files, the list itself being the sole input you pass to the program.
- $FindSetVolEtc: an example of a small program knocked off in a minute to
- solve a specific search problem. Searches for a list of specific terms, prints
- the file name and line number where found, together with the context of the find.
- $FrequencyWord: lists unique words in one or more documents, in declining order
- of frequency. Demonstrates associative arrays and the sort command. A companion
- to $WordFrequency.
- $List_Potential_C_Locals : feed this the body of a C function, and it will return a
- list of candidates for declaration as local variables within the function. Contains
- a near-complete lexical analyser for C, and produces best results if the calling
- application supports the “lookup” function.
- $Lockout (IMM): a pathological excess. MUST be stopped with <Command><period>. Displays
- a marquee–style message in Chicago or “giant” while you go to lunch. Trivial, but
- the code itself is worth looking at (it can archive giant messages to files,
- demonstrates two–dimensional arrays, implements severe abuse of the
- progress() function). You can set the message before running, by changing
- the “message” variable. Some other options available.
- $LogDaemon: the only supplied program that must be run in concurrent mode
- only. It waits around until you copy the (almost) word "logit", flashes the
- menu bar to acknowledge, and then will append the NEXT bit of text you
- copy to a specific file, together with a date stamp. Then another flash to
- signal that it’s done. This program runs until you type <Command><period>.
- See instructions before using, since you’ll need to change the name of the log file.
- $LongestLines: will print out a list of the longest lines in one or more files. Use
- “Set variables” to set how many lines to print, and how many spaces in a tab
- before running. Properly converts tabs to spaces for calculating lengths,
- illustrates several basic string functions.
- $LookupTest: a demonstration of the lookup() built–in function.
- $MFSLister: searches for a string or a regular expression (restricted to checking
- one line at a time). Prints file name and line number where found, with optional
- printing of the line containing the match.
- $MFS_SuperLister: searches for a regular expression or plain text involving
- variable white space, can match it even if it spans a variable number of lines (try
- that with Grep!). Lists file name and line where found. It’s up to you to
- provide the text or regular expression. The innards are much like $MFS_SuperReplace.
- $MFS_SuperReplace: multi-file search and replace, searching for a regular
- expression or a string of literal text that can span a variable number of lines.
- Replacement text can replace or extend the pattern found. Alters the original files,
- fully documents changes to stdout. Demonstrates using the hAWK() function
- to selectively alter and execute a program, handling a variable number of
- input lines at once in a “rolling buffer”.
- $Print_MENU_Resource: given the result of Read Resource on a MENU resource,
- this program prints a nicely–formatted version of the menu. A sample for doing
- your own custom resource or data formatting and content verification, including
- all of the necessary basic functions for doing so.
- $Print_MPSR_1007: given the result of Read Resource on a “MPSR 1007” resource
- (ie marks for a text file), prints out a nice version (see also $Print_MENU_Resource).
- $printNF: trivial, prints the number of fields in each input line.
- $ProgressTest, $PromptTest (IMM): demonstrate the prompt() and progress() functions.
- (The ultimate progress() example is $Lockout; for a nice little prompt()
- example, see $YoungMath).
- $RoughIndexer: if you dream of automatically generating an index, you can
- start here.
- $RunClip: for short, disposeable programs to be run concurrently (note that $Type&Run
- only runs in immediate mode). The calling application must support passing its
- clipboard to hAWK (eg EnterAct). Create your program in the calling app, Copy it,
- bring input to hAWK's attention (eg front text or a multi-file selection), then call
- up hAWK and select and run $RunClip. Your copied program will be saved to the file
- “$hAWKTempProgram”, and then executed using the built-in hAWK() function.
- $SortTest: a test of the built–in sort() function, doing dictionary order. For
- a real use, see $WordFrequency.
- $SortTest_Nums: a sort() test on numbers. Uses rand() to generate the numbers.
- $StubFunctions: given a list of C function prototypes, generates empty function
- shells for the function definitions.
- $TabsToSpaces: converts tabs to spaces in one or more documents, replacing each tab
- by the appropriate number of spaces (anywhere from 1 to “spaces_in_tabs”),
- consistent with the tab interpretation of THINK C et al. You set the
- number of spaces in a tab with “Set variables”, and also whether to overwrite
- the original file or make a copy with a new name. Demonstrates some
- basic file–handling methods
- $Time: just prints out the time, using the TIME built–in variable, and the
- time() function for comparison.
- $TwoColumnsRight: given a list of numbers in two columns, right–justifies
- the numbers in the columns. Demonstrates dynamically building
- a printf() format string with variables and string concatenation.
- $Type&Run (IMM): for short, disposeable programs, use the dialog box presented by this
- program to type in and run your one or two-liner. Since <Return> means “OK”
- in the dialog, use <Command><Return> to advance to a new line. Illustrates using
- the hAWK() function to save and execute a program.
- $Uppercase: changes the first letter in each input field to upper case if
- it is a lower case letter. Uses match(), sub(), substr().
- $Whazzat: translates C declarations into English. Works best if the calling
- application supports the “lookup” function so that special terms in your
- declaration (typedefs, struct tags etc) can be diagnosed.
- Illustrates using functions instead of pattern–action blocks, retrieving tokens
- with string functions while parsing, reformatting long lines for output.
- $WordFrequency: a “classic” use for AWK - print sorted list of unique words
- in the input, together with the number of times each word is used.
- $XRef: generates file and line number listing for your choice of terms in C source
- code. Illustrates the hAWK() function, sorting. The calling application must
- support the “lookup” function (see “Built–in string and file functions” in the
- “Actions” chapter).
- $XRef_Full: like $XRef, but doesn’t skip comments and strings.
- $YoungMath (IMM): demonstrates the prompt() function while urging you to add
- numbers.
-
- ---------------------
- hAWK program structure
- ---------------------
- From start to finish
- A typical hAWK program run progresses as follows:
- 1 From the hAWk setup dialog, specify the main program to be run, add any library files
- that go with it (optional), specify initial values for variables (optional), and build
- a list of input text files for the program to work on (optional, but almost always
- included).
- 2 Collect the main program and libraries together into one big program. Reduce it
- to a form more suitable for interpretation. Assign initial values to variables if you
- have provided any. The list of input files is made available to the program,in the
- array ARGV[] of file names.
- 3 Execute the program: by default, hAWK automatically reads the text from the input
- files into memory, one “record” at a time (the default is that a line is a record). If
- a record matches one of your specified patterns, then action is taken. Statements may
- optionally be executed before and after the input is dealt with. Schematically, a
- generic hAWK program looks like
- #An abstract hAWK program:
- BEGIN {beginning statements}
- pattern1 {action statements for pattern1}
- ...
- patternN {action statements for patternN}
- END {ending statments}
- (--supporting function definitions--)
- and the corresponding program execution proceeds as follows:
- • execute any supplied BEGIN statements
- • read the input files into memory, one record at a time; for each record
- check all patterns; if the pattern is TRUE for the current input record,
- execute the associated action statements; in C this would look like:
- while (get_another_input_record())
- {
- for (pattern1 to patternN)
- {
- if (pattern is TRUE)
- {
- action statements for the pattern
- }
- }
- }
- • execute any END statements
- 4 Unless otherwise specified by redirection, all output via “print” or “printf”
- statements goes to the default standard output file, called “$tempStdOut”.
- 5 Comments in the source code, which begin with a “#” and continue to the end
- of the line, are ignored.
-
- BEGIN, END, and pattern–action blocks may occur in any order in the source for
- the program. Programs may also contain function definitions, which are
- introduced by the “function” keyword, and take the general form:
- "function" funcName(parameter1, parameter2,...local variables)
- {
- statements making up the function body
- }
- If a function is generally useful, it may be placed in a library file to save duplication.
- You’ll find little emphasis on libraries, since it costs very little to duplicate a function
- right in the main program, and this makes the programs easier to read.
- Library files should be reserved solely for function definitions to avoid confusion.
-
- hAWK automatically reads in your input files one “record” at a time, also breaking each
- record into “fields”. The current record is in the built-in variable $0, and the fields
- are in $1, $2, …$NF (where NF is another built-in variable giving the number
- of fields in the current record). By default a record is the same as a line and fields are
- separated by blanks or tabs, so you can think of the default as reading your input one
- line at a time into $0 and making the inidividual words available in $1, $2 etc (but note
- that all punctuation except blanks, tabs, and returns will still be present in the fields).
- For example, if the current line in an input file reads
- "for (i = 0; i<7; ++i)"
- then that will be the content of $0, and the fields will be
- $1 = "for", $2 = "(i", $3 = "=", $4 = "0;", $5 = "i<7;", $6 = "++i", with
- NF, the current number of fields, set to 6.
-
- Here’s a real program to give you a taste ("$EnumSwitch", in the “hAWK
- programs” folder):
- #$EnumSwitch
- #Select a bunch of enums, and run Hawk on the front selection
- # -optionally select the entire enum body from '{' to '}' with Balance
- #Leave "Show std out" and "Select all of stdout" checked
-
- { gsub(/=[^,]*/, " ")#remove initializations for the enum constants
- gsub(/=(.)*$/, " ")#ditto
- gsub(/[,{};]/, " ")#remove remaining punctuation, leaving just the enums
- for ( k = 1; k <= NF; k++)#build an array containing the enum names
- case[++i] = $k
- }
-
- END { print "switch (??)"
- print "\t{"
- for (k = 1; k <= i; ++k)
- {
- print "case " case[k] ":"
- print "\t"
- print "break;"
- }
- print "default:"
- print "\t"
- print "break;"
- print "\t}"
- }#end program
- Given a list of names from an enum definition, such as
- "{left, right, up, down, twilightZone = 999}" this program generates
- switch (??)
- {
- case left:
-
- break;
- ...etc…
- case twilightZone:
-
- break;
- default:
-
- break;
- }
- To run this program: first select a list of comma-separated names (typically use the
- contents of an enum definition); select "hAWK" from the calling application’s menu;
- select "$EnumSwitch" from the "Main program" popup; (note the "Take input from:"
- popup will then read "Front text selection"); and click the Run button. The generated
- "switch" statement will appear in a window called "$tempStdOut", ready to be copied
- and pasted into your working window.
-
- Grouping and breaking lines
- The rules for organizing and grouping your program lines differ a bit from the rules
- for C; a <Return> (also called newline) can stand for a semicolon after most hAWK
- statements, the price of this being that lines cannot be arbitrarily broken as in C,
- to avoid confusion between ending a statement and merely continuing it to the next
- line. The rules below are listed in rough order of their impact on whatever C
- formatting habits you have.
-
- • When in doubt, use a backslash '\' immediately followed by a <Return> to continue a
- long line, as with preprocessor macro’s and strings in C. For example:
- x = y + (z - 1) + SomeFunction(param1, param2\
- , param3, param4) + w;
- • Long conditional tests can be broken to the next line immediately after any logical
- operator (&&, ||, !). Eg:
- if ( lineNumber >= maxLines &&
- $0 != "")
- • A long line may be broken after a comma, eg
- x = y + (z - 1) + SomeFunction(param1, param2,
- param3, param4) + w;
- • The '{' that begins an action should be placed on the same line as the end of the
- pattern for it, eg
- FNR == 1 || FNR == 2 ||
- FNR == 3 { #Note '{' is on same line as end of pattern
- print
- }
- • A comment in hAWK begins with a '#' and continues to the end of the line. A comment
- can be placed at the end of any line except a line that is continued with a backslash and
- <Return>.
- • Group multiple statements together with '{' and '}', as in C, eg
- if ($0 ~ /TEST/)
- {
- print "TEST on line", FNR
- ++numTests
- }
- • When in doubt, terminate a single statement with a semicolon. Multiple statements
- may be placed on one line if separated by semicolons, eg
- if (a >= b) print "a is bigger"; else print "b is bigger";
- or
- do ++x; while (x < maxForX);
- • In if-else and do-while constructs, the “else” and “while” keywords should either
- be placed on a new line or preceded by a semicolon or '}'. In other words, clearly
- signal the end of the “if” or “do” part, so that the “else” or “while” doesn’t pop
- up by surprise:
- these are OK;
- if (a > b) ++b; else ++a
-
- if (a > b) ++b
- else ++a
-
- do {--x; print x} while (x > 0)
-
- these are not;
- if (a > b) ++b else ++a
-
- do ++x while (x < maxForX);
-
- ----------------------
- The Command line and ARGV[]
- ----------------------
- To run a hAWK program, you must tell hAWK which program to run, and what files
- to use for input data, with other optional details. Classically, these file names etc
- are passed to AWK in an array of pointers called argv; hAWK works the same way,
- but these names are generated for you when you set up a hAWK run using the
- setup dialog, saving you the work of typing them all in each time.
-
- All you really need to know about the command line is that, at the time a program
- is run, the names of the input files it is being asked to deal with are contained in
- the array named ARGV, and the number of input files equals ARGC-1 (where
- ARGV is a built-in array name, and ARGC is a built–in variable name). Input file
- names are full path names, so typical contents are
- ARGV[1] = "Disk:folder:...:folder:First_Input_file"
- ...
- ARGV[ARGC-1] = "Disk:folder:...:folder:Last_Input_file".
- Running the sample program “$EchoFullPathNames” on some input files will provide
- you with a specifc example—why not give it a try? Use your calling application to
- select some files for multi–file operations (“searching”), then run
- $EchoFullPathNames and see what results. This is the complete program:
- BEGIN {
- for (i = 1; i < ARGC; ++i)#note ARGV[0] is just "hAWK"
- print ARGV[i]
- }
-
- Details follow on the command line generated by hAWK’s setup dialog, in case you
- are interested in modifying hAWK. You may also find this background helpful if you
- use the hAWK() function, which executes another program from within a program
- and requires an explicit command line as its argument (see ch. Q, “The hAWK function”).
-
- The command line passed to hAWK from the setup dialog takes the general form
- hAWK -fProgramName {-fLibraryName} {-vVariable=value} --
- {InputFileName}
- where the {} brackets indicate that an item may be repeated or omitted. For example, if
- running a program "$BigSort" with supporting library "Sort_Routines", with the files
- to be sorted being "Text1" and "Text2" then the command line passed to hAWK by the
- setup dialog will be something like
- hAWK -f$BigSort -fSort_Routines -- HardDrive:Code Folder:Sub Folder:Text1
- HardDrive:Code Folder:Sub Folder:Text2
- The "-f", "-v", and "--" are little markers that hAWK uses to tell what's what.
- "-f" means a program file, "-v" means a variable assignment, and "--" means
- that ony input files (if anything) follow this marker.
-
- By the time the command line becomes available to you within your hAWK program,
- the array "argv" is a hAWK array of strings called "ARGV" that contains only "hAWK"
- in ARGV[0] followed by the names of the input files in ARGV[1], ARGV[2] etc,
- and ARGC is set to the number elements in the ARGV array, namely the number of
- input files plus one. The last input file name is ARGV[ARGC-1].
-
- Normally, the input file names are the only things on the command line of interest
- that you don't already have access to. You'll have acess to the variables anyway,
- and one can't help thinking that it would be an odd program indeed that needed to know
- its own "ProgName".
-
- Here's a hAWK program that prints a complete list of the input file names passed
- to it ($EchoArgs again):
- BEGIN {
- for (i = 1; i < ARGC; ++i)#note ARGV[0] is just "hAWK"
- print ARGV[i]
- }
- If you included this block in $BigSort above, then the output would be something like
- HardDrive:Code Folder:Sub Folder:Text1
- HardDrive:Code Folder:Sub Folder2:Text1
- —as you can see, you're getting the full path names of the file, not just the file names.
- Here's a version that prints just the file names proper:
- BEGIN {
- for (i = 1; i < ARGC; ++i)
- {
- n = split(ARGV[i], names, ":")
- print names[n]
- }
- }
- for which the output would for example be
- Text1
- Text2
- The important thing to note here is that hAWK deals with full path names for files,
- especially relevant if you are redirecting input or output (more on this later).
-
- When you assign values to variables using the "Set variables" button in the setup
- dialog, the result is the same as if you assigned the value in the BEGIN block of
- your program. However, you should NOT use quotes if you are assigning a text
- string to a variable using "Set variables"—for example, the variable assignment
- find=text to find
- within the "Set variables" dialog is equivalent to the statement
- BEGIN {find = "text to find"}
- within your actual program. This is meant to be a convenience, but is perhaps a
- nuisance, in that any spaces between the '=' and the value are significant:
- find =text
- is not the same as
- find = text
- —that space between the '=' and the 't' of "text" will be included in the string for "find".
-
- The "Set variables" button can be used to set the value of any hAWK variable,
- whether your own or a predefined (built-in) variable, and it is easier to change
- a variable this way than to edit the program itself. Up to 10 variables can be set
- with "Set variables", and your variable settings will be saved for the next run
- if you click the "Save settings" button in the setup dialog. For an illustration and
- more details, see the “Running hAWK programs” chapter.
-
- ----------------
- Variables and constants
- ----------------
- Variable names and types
- hAWK has many built–in variables, and you can use your own. A variable of your
- own devising springs into existence when you first use it, with no need to declare
- it (excepting perhaps local variables for functions, which need to be not so much
- declared as “mentioned”—see “Local variables in functions” below).
-
- Variable names in hAWK take the same form as C names: a letter or underscore
- followed by any number of letters, underscores, and numbers.
-
- hAWK has both scalar variables and one–dimensional arrays. The value of a variable or
- array element may be a (floating–point) number OR a string, and the specific type at
- any time depends on how you use the variable. While numeric values in hAWK are
- nominally floating–point, if you consistently use a variable as an integer you will
- get predictable results. For example,
- for (i = 0; i <= 1; ++i)
- print i
- will print two values, 0 and 1, guaranteed.
-
- Uninitialized variables have the numeric value 0 and the string value "" (the null, or
- empty, string). Note this differs from a variable that has been explicitly initialized
- to zero, for in this case while the numeric value will be zero the string value
- will be "0".
-
- Constants
- Constants can be integers, floating–point numbers, or strings. For example,
- x = "A string of text";
- y = 7;
- z = .31415926E1;
- pat = "[_A-Za-z][_A-Za-z0-9]*"; (a string to be interpreted as a regular
- expression - it matches a hAWK variable name).
-
- Record and field variables
- After the BEGIN block(s) of a program have been executed, a hAWK program proceeds
- to automatically retrieve records from your input files one at a time to the built–in
- variable $0, and individual fields in the current record can be accessed with the
- built–in variables $1, the first field, $2 etc up to $NF, the last field, where NF is
- a built–in that records the current number of fields. Records are separated according
- to the string contained in the built–in record–separator variable, RS. By default
- this contains just a return, ie RS = "\n", so a record is the same as a line. You can
- change the value of RS, and setting RS to ""(the null string) will cause empty lines
- to be treated as the record separator. Note that the record separator itself is
- trimmed from the record.
-
- Similarly, fields are separated in accordance with the value of the field–separator
- variable, FS. By default the field separator is a regular expression standing for
- “one or more blanks or tabs”, and as a nicety if you use the default value of FS then
- any leading blanks or tabs will be trimmed away from the first field, $1.
-
- References to non-existent fields (fields after $NF ), produce the null-string.
- However, assigning to a non-existent field (e.g., $(NF+2) = 5 ) will increase the
- value of NF , create any intervening fields with the null string as their value, and
- cause the value of $0 to be recomputed, with the fields being separated by the value of
- OFS, the output field separator. A negative field number is an error.
-
- Many functions in hAWK allow you to optionally specify a string for them to work on,
- and if you don’t specify a string then it uses $0, the current input record, by default.
- For example,
- print "some text"
- will do just that—print the string "some text" to the standard output, whereas
- print
- all by itself, will print the contents of $0 to stdout, and thus it has the same effect as
- print $0
- Note that “print” tags on the contents of ORS, by default a return, to its output, so in the
- default case the return that was trimmed away when retrieving the current input
- record is added back. Thus, the hAWK program that consists of the one line
- {print}
- will echo all of its input to stdout (the file $tempStdOut) without change, though a flurry
- of activity involving returns takes place behind the scenes.
-
- This little program prints the individual fields of each input record to individual lines:
- {for (i = 1; i <= NF; ++i)
- print $i
- }
- —note that the field specifier can be a variable as in “$i”, and doesn’t have to be
- a constant.
-
- Built–in variables
- hAWK's built-in variables are:
- ARGC the number of input files plus one
- ARGV array of command line arguments. The array is indexed from 0 to ARGC - 1,
- the input file names being ARGV[1] through ARGV[ARGC-1].
- Dynamically changing the contents of ARGV can control the files used for data.
- FILENAME the name of the current input file. If no files are specified on the command line,
- the value of FILENAME is "-". A hAWK program may do all of its work in a BEGIN
- block, with no need for input (generating a list of random numbers for example).
- FNR the input record number in the current input file. Reset to 1 when starting a new
- input file. Hence the pattern “FNR == 1” detects the start of each file.
- FS the input field separator, a blank by default. If the default FS is used then
- leading blanks and tabs are trimmed from $1.
- IGNORECASE controls the case-sensitivity of all regular expression operations.
- If IGNORECASE has a non-zero value, then pattern matching in
- rules, field splitting with FS , regular expression matching with
- ~ and !~ , and the gsub() , index() , match() , split() ,
- and sub() pre-defined functions will all ignore case when doing
- regular expression operations. Thus, if IGNORECASE is not equal to
- zero, /aB/ matches all of the strings "ab", "aB",
- "Ab", and "AB". The initial value of IGNORECASE is zero,
- so all regular expression operations are normally case-sensitive.
- NF the number of fields in the current input record.
- NR the total number of input records in all input files seen so far.
- OFMT the output format for numbers, %.6g by default.
- OFS the output field separator, a blank by default.
- ORS the output record separator, by default a newline.
- RS the input record separator, by default a newline. RS is exceptional
- in that only the first character of its string value is used for
- separating records. If RS is set to the null string, then records are
- separated by blank lines. When RS is set to the null string, then
- the newline character always acts as a field separator, in addition
- to whatever value FS may have.
- RSTART the index of the first character matched by match(); 0 if no match.
- RLENGTH the length of the string matched by match(); -1 if no match.
- SUBSEP the character used to separate multiple subscripts in array
- elements, by default "\034", some kinda up arrow very rare in text.
- (and three added for the Macintosh version)
- RUNERR short for "run error", a file name that you can use to print your own error
- messages to, as in print "Error during run" > RUNERR. Default name
- is $tempRunErr, and you'll find the file in the same folder as $tempStdOut.
- STDPATH path name that can be prefixed to any file name you wish to be written to the
- same folder as stdout ($tempStdOut). Typically looks like
- "Disk:folder1...:THINK C folder:" and typical use looks like
- outname = "MyOutFile"
- fullOutName = STDPATH outname;
- print "something" > fullOutName;
- TIME at start of run, eg "Sunday, October 13, 1991 07:58 AM"
-
- Local variables in functions
- Function definitions in hAWK resemble those of C a bit, but local variables require
- an odd syntax. They must be listed in the parameters of the function, after the real
- parameters, in order to be treated as local. All other variables in hAWK have global
- scope. For example, in
- function SumArray(arr, index, sum)
- {
- for (index in arr)
- sum += arr[index];
- return sum
- }
- the only real parameter is the array name “arr”. This function sums up the contents of
- the array and returns the sum, used as in “sum = SumArray(x);” where x is an
- array containing numbers. The variables “index” and “sum” look like orphans there
- in the parameters, but this is just the hAWK way of declaring local variables. Both
- index and sum cannot be affected by any statements outside the SumArray function (that
- is, they are local in scope), and as a bonus hAWK initializes even local variables to 0
- each time the function is called. Functions are described in more detail a little later in
- the chapter “User-defined functions”.
-
- Setting variables on the command line
- When variables are set using the “Set variables” option in the setup dialog, no quotes
- should be used around strings, and no space should be put between the equals sign and
- the string or number unless you want it to be included in the value. For example, the
- equivalent of
- BEGIN {find = "some text to find"; first = 7;}
- in the “Set variables” dialog would be
- find =some text to find
- first =7
- (the space before the equals sign is optional).
-
- Conversion between numbers and strings
- Conversion of a variable’s value between number and string is automatic in hAWK when
- circumstances call for it, and can be forced by you as well. When an operator is strictly
- numeric, the value of its operands will be forced to numbers if necessary, and similarly
- if an operator expects to deal strictly with strings then values will be forced to strings.
-
- For example, in
- a = "102";
- b = a + 1;
- “a” starts out as a string, but the “+” operator deals strictly with numbers, so “a”
- is converted to the number 102.0 on the second line.
-
- And in
- a = 27;
- b = "trombones";
- c = a b; #there is a space between a and b
- we see the invisible “concatenation” operator at work. Two variables or constants separated
- by just a space are treated as strings by hAWK and concatenated together. So “a” is converted
- to a string on the third line, and “c” ends up holding the string "27trombones".
-
- Some operators (all of the comparison operators == <= >= etc for example) can accept
- either strings or numbers. When this is the case, the rule is that the operation proceeds
- numerically if both operands are currently valid numbers, but proceeds as a string operation
- otherwise.
-
- You can force a variable to be treated as a string by concatenating the null string to it. For
- example, no matter what the values of a and b are, the comparison
- a "" == b
- will proceed as a string comparison.
-
- And you can force a variable to be treated as a number by adding 0 to it, as in
- a + 0 == b + 0
- but note in this case that both operands should be forced to numeric type.
-
- Arrays
- Arrays are subscripted with an expression between square brackets, arr"["expr"]".
- Array values can be numbers or strings, but the index is always interpreted as
- a string. For example, when you write
- arr[1]
- the 1 is converted to the string "1" for use as the array index, so arr[1] is
- the same as arr["1"]. This sort of array is called “associative” since it can
- associate one string of text with any other, eg
- arr["John Henry"] = "was a log-drivin man"
-
- If the index expression is an expression list ( expr1, expr2, expr3,... ) then the array
- subscript is a string consisting of the concatenation of the (string) value of each
- expression, separated by the value of the SUBSEP variable, which is by default
- “\034” (decimal 28, an up arrow). This facility is used to simulate
- multiply–dimensioned arrays. For example:
- i = "A" ; j = "B" ;k = "C"
- x[i, j, k] = "hello, world"
- assigns the string "hello, world" to the element of the array x
- which is indexed by the string "A\034B\034C".
-
- The special operator "in" may be used in an "if" statement to see if an array has
- an index consisting of a particular value:
- if (val in array)
- print array[val]
- If the array has multiple subscripts i j k, use
- if ((i, j,k) in array) instead . The alternate
- if (array[val] != "")
- actually creates the array array[val] element if it does not exist, so using “in”
- is usually better.
-
- The "in" construct may also be used in a for loop to iterate over all the elements of an
- array:
- for (i in arr)
- delete arr[i] # or print arr[i] , or print i, arr[i]
- An element may be deleted from an array using the delete statement. New elements should
- not be added to an array while looping over it with the "in" for-loop, since hAWK isn’t
- quite smart enough to handle that very well.
-
- Behind the scenes, indexes for an array are stored in a hash table, Retrieval of an array
- element takes constant time up to a moderate array size (~1000), but as array size
- increases retrieval time will increase as a linear function of the size.
-
- Some array examples:
- for (i = 1; i <= 100; ++i)
- x[i] = i;
- This does what you would expect, creating x[1] =1, ...x[100] = 100. Note, however, that
- while i is treated as an integer in the for loop, it is converted to the string representation
- for that number when used as the index for x.
-
- for (i = 1; i <= NF; ++i)
- wordCounter[$i] += 1;
- Here we see the real power of hAWK’s associative arrays. $i is a string containing a field
- on the current input line, and this string is used as an index into the wordCounter array.
- If there is no element in the array yet for the index, a new element is created (and
- initialized to 0/the null string, as for regular variables). The array element itself holds
- just a count of how many times the string has been seen. Obviously, you can’t access these
- array elements by incrementing a numeric index—here’s where “in” comes in:
- for (word in wordCounter)
- print word, "was seen", wordCounter[word], "times."
- prints out the words used to index wordCounter, together with the word counts, a sample
- line being
- parsimonious was seen 1 times.
- The one drawback of this simple example is that the words will be printed in a rather
- arbitrary order (internally, the entries in a hash table are being accessed). However,
- even this shortcoming can be overcome. The sample program “$WordFrequency”
- shows how to sort an array such as wordCounter into dictionary order on the index.
-
- while (getline x > 0)
- lines[++n] = x;
- The “getline x” will retrieve records from your current input file to the variable x,
- from the current position to the end of the file. Each record is saved away as an element
- in the array “lines”. Here the index is a number (technically the string for the
- number) and the element is a string —the reverse of the last example.
-
- times[3,7] = 21;
- The actual index is "3" "\034" "7" concatenated together. A multi-dimensional
- array can be run through in the same way as in C:
- for (i = 1; i <= iMax; ++i)
- {
- for (j = 1; j <= jMax; ++j)
- {
- print times[i,j] #or whatever
- }
- } # note "for (k in times) print times[k]" could also be used.
-
- -------
- Patterns
- -------
- Patterns and actions
- At the top level, a hAWK programs consists of patterns and actions, of the general form
- pattern { action }
- When a pattern evaluates to true (non–zero), the corresponding action is taken.
- Patterns resemble the conditions found in a C if-statement, but several kinds of
- patterns, notably BEGIN, END and patterns using the matching operator '~', are not
- found in C. As described earlier, hAWK will automatically read in your input one
- record at a time to the variable $0, and each pattern is evaluated in turn; if the pattern
- is true for the current input, then the action statements are executed.
-
- A missing pattern evaluates to true, so action statements with no preceding pattern
- are executed for every input record. A missing action is equivalent to
- { print }
- which prints the input record to stdout. It’s equivalent to {print $0}, by the way.
-
- Here’s a sample pattern-action block that is often useful:
- FNR == 1 { z = split(FILENAME, names, ":") }
- FNR stands for "file number of records", reset to 1 at the beginning of each input file.
- FILENAME is a variable holding the full path name of the current input file. The split on
- ':' splits FILENAME into an array, treating the ':' as the element separator. Often, one
- wants just the file name proper without the disk and folders, and this is given by
- names[z]. For example, if FILENAME = "Disk:folder:thefile" then the split produces
- names[1] = "Disk", names[2] = "folder", and names[3] = "thefile", with "z" being
- set to 3. The statement "print names[z], FNR" will print the current input file name
- and current line number to stdout.
-
- The “Summary of patterns” section at the end of this chapter contains a small program
- that will let you try out patterns as they occur to you. Or you could use $RunClip.
-
- BEGIN and END
- BEGIN and END are two special kinds of patterns which are not tested against the input.
- The action parts of all BEGIN patterns are merged as if all the statements had been
- written in a single BEGIN block. They are executed before any of the input is read.
- Similarly, all the END blocks are merged, and executed when all the input is exhausted
- (or when an exit statement is executed). BEGIN and END patterns cannot be combined
- with other patterns in pattern expressions. BEGIN and END patterns cannot have
- missing action parts.
-
- BEGIN {FS = ",[ \t]*|[ \t]+"}
- sets the field separator to either a comma followed by optional blanks and tabs or
- one or more blanks and tabs—a common field separator in a real database.
-
- END blocks are often used to finish up after all the input has been seen, as in this
- little program:
- {out[++n] = $0}
- END {for (i = n; i >= 1; --i) print out[i]}
- which accumulates all input records in the array “out”, and then at the end
- prints out the records in reverse order.
-
- Expressions as patterns
- Simply put, an expression is any sensible combination of variables, operators, and
- (rarely) function calls. When an expression used as a pattern evaluates to a non–zero or
- non–null result, the action following it will be carried out.
-
- The most common sort of expression used as a pattern is the comparison, involving the
- operators ==, <=, >=, >, <, and !=. These can be used with any hAWK variable or
- calulated result, and it is a refreshing improvement over C to be able to test
- two strings for equality with the simple “a == b” instead of “!strcmp(a,b)”.
-
- Comparison patterns quite often involve tests on the current input, such as
- “$1/$2 >= 100”, “$3 == "Wilhelmina"”, “$0 != ""”, the last testing that
- the current input line is not empty. Built–in variables are also popular, as in
- the “FNR == 1” example a few paragraphs above, which detects the start of an
- input file. Your own variables can of course appear, as in
- $1 != lastFieldOne { print "New field one is", $1
- lastFieldOne = $1
- }
- which prints the contents of the first field on the input line whenever it changes.
-
- In a comparison, if both sides are numeric then the comparison is made numerically,
- but if one side evaluates to a string then the comparison is done in terms of strings,
- with the other side first being converted if necessary to a string.
-
- String-matching patterns
- The matching operator, denoted by a tilde (~), allows you to detect whether one string
- contains another string, though technically that other string is treated as a “regular
- expression”. More on regular expressions in just a minute, but for now you can form a
- regular expression to look for from a string of characters by putting a forward slash
- before and after them. For example, if you wish to determine if the current input line
- contains the string "exception", then the pattern
- $0 ~ /exception/
- will do it. Note that it could match the line
- "while this is not an exceptional case, there are other"
- that is, the match does not have to be an entire word.
-
- By default if you omit the string for the matching operator to check against, and further
- omit even the matching operator, leaving just the regular expression enclosed in slashes,
- then the match will be done against the current input line $0. In other words,
- /regular expression/ {action}
- is the same as
- $0 ~ /regular expression/ {action}
- —and since even the action is optional (recall the default is to print $0), about the shortest
- hAWK program you can write is
- /a/ #equivalent to $0 ~ /a/ { print $0 }
- which will print any input line containing an “a” to stdout.
-
- To match punctuation explicitly in your expression you should precede it with a
- backslash, eg /question\?/, /the end of the sentence\./, /array\[index\]/.
-
- You can use quotes instead of the forward slashes to surround the text of your regular
- expression with the same results. In this case, though, the matching operator must
- explictly appear. Eg
- $0 ~ "Mars" {print "red planet detected on input line", FNR}
- And to match punctuation explicitly inside the quotes, you should precede the punctuation
- with two (that’s right, two) backslashes. For example, to match "the end." use
- string ~ "the end\\."
-
- Using forward slashes instead of quotes around your regular expression has three small
- advantages; matching against $0 doesn’t need to be fully written out, only single escapes
- are necessary to match punctuation, and after a while the forward slashes will stand out
- as you read your programs, signalling a matcher.
-
- The negation of the matching operator, “!~”, allows you to determine if a string does
- not contain some regular expression, as in
- $2 !~ /A/ {print "Error, second field does not contain the letter A"}
- and any points mentioned above for ~ apply to !~.
-
- Regular expressions
- Regular expressions aren’t as hard to use as a first impression suggests, and if you try
- out a dozen you’ll be hooked, guaranteed. In regular expressions certain characters
- have special “powers” that allow you to search for entire related groups of strings
- with a single specifying string. Consider that an ordinary “find” command will not let
- you completely match the following variations of a string: plurals; possessives;
- variable blanks, tabs and especially returns between the words of a string; one or more
- alternate words in the string; the complete word that contains some special substring;
- two or more complete strings at once (one or the other).
-
- A regular expression is nothing more than a string of text with optional special
- “metacharacters”, and in most cases the string to be used can result from
- the evaluation of a variable, or the concatenation of several strings or variables.
- This means you can build the regular expressions for your program during the
- execution of your program, modifying them on the fly to suit changing circumstances.
-
- Parts of a regular expression can be grouped (with ordinary parentheses), and later in
- the regular expression or in a replacement string can be referred to by the group “tags”
- \1, \2, ... \9 where \1 refers to the group started by the first left parenthesis, \2 to
- the second, etc. These allow you to match a small pattern within the context of a larger
- one, detect duplicate expressions, change the order of the groups and so on. Note that
- parentheses have the highest precedence of all regular expression “operators”, so
- they serve two purposes; changing the order in which the metacharacters apply, and
- marking the boundaries of a group, for later reference via \1..\9. More on this in a bit.
-
- Regular expressions are built from ordinary characters, the escape sequences
- \t \n \b \B \w \W \< \> \1 \2 \3 \4 \5 \6 \7 \8 \9
- and from the metacharacters
- \ ^ $ . [ ] | ( ) * + ?
- which are the ones with the special powers mentioned above. As you saw in the above
- section, if a regular expression contains no metacharacters then it behaves like an
- ordinary “find” string in that each character in the regular expression must match
- a character in the string being searched. The following table summarizes all
- character usage in a regular expression (where a b c are ordinary characters,
- m is a metacharacter, r is a regular expression, and d is a digit):
-
- c matches the non-metacharacter c itself
- \m matches the literal character m, eg \$ matches the dollar sign.
- . matches any single character except newline.
- ^ matches the beginning of a line or a string.
- $ matches the end of a line or a string.
- [ abc... ] character class, matches any one of the characters a or b or c etc... .
- [^ abc... ] negated character class, matches any character except abc... and newline.
- (Ranges of characters may be abbreviated in character classes, as in
- [0-9] which matches any digit, [A-Za-z] which matches any letter,
- [^0-9] which matches anything but a digit).
- \w matches a “word” character, exactly equivalent to [0-9A-Za-z]
- \W matches a non-word character, ie [^0-9A-Za-z]
- \< matches the beginning of a word.
- \> matches the end of a word.
- \b matches the beginning or end of a word (a word boundary).
- \B matches the boundary (beginning or end) of a set of non-word characters.
- \t matches a tab.
- \n matches a newline (the Return key).
- r1 | r2 alternation: matches either r1 or r2, eg "blue|green"
- r1r2 concatenation: matches r1 followed by r2 .
- r + matches one or more r 's.
- r * matches zero or more r 's. (Note that zero r’s can be anywhere in the text)
- r ? matches zero or one r 's.
- ( r ) grouping: matches r. Parentheses have two distinct uses; to override
- default precedence of metacharacter operators, and to tag a subexpression
- for subsequent reference.
- \1...\9 stand for whatever text the first through ninth set of parentheses currently
- match, counting opening parentheses from left to right. Note that if the
- pair of parentheses has a + or * or ? operator after it, then all of the
- matches are included, eg /(foo)+bar/ applied to "foofoofoobar" will set
- \1 to "foofoofoo". To get just the first foo, use /(foo)\1*bar/ - then
- \1 is set to "foo". (Perl users note this is the opposite of what
- you are used to).
- \ddd is interpreted as an octal number, as in C. The digits exclude 8 and 9,
- needless to say, and there can be from 1 to 3 digits in the number.
- Note that \1 through \7 are interpreted as subexpression tags unless
- followed immediately by another octal digit (eg \23 is not tag 2 followed
- by a 3, it is the octal number 19 decimal). \8 and \9 are always tags,
- since 8 and 9 are not octal numbers. To refer to octal numbers 1 to 7,
- use \01 to \07. To follow a tag with a low number (eg \2 followed by 3),
- use the octal representation of the number (eg \2\063 -- \063 equals
- 51 decimal, the ASCII code for 3).
-
- The metacharacters ^ and $ to match the beginning and end of strings, and
- \b \B \< \> to match various boundaries don’t actually match any characters;
- rather they force alignment to a particular text position. For example,
- /\brun\b/ will always match just “run” if it matches anything, but will
- not match "runner" or "brunt". By comparison, /\Wrun\W/ won’t match
- “runner” or “brunt” either, but it will include any non–word character that
- happens to come before or after the word “run”. Normally you won’t want to
- include leading or trailing spaces etc in the match.
-
- Parentheses () have the highest precedence, allowing you to override default
- precedence when needed. The “repetition” operators * + ? have the next–highest
- precedence, followed by concatenation, with alternation having the lowest precedence of
- all. For example, in abc*d the * applies only to the c since the repetition operator acts
- before concatenation, and in abd|def the | applies to abd and def since concatenation
- binds them together into little groups of three before alternation can play.
-
- Regular expression can be used to just locate an instance of a pattern, as in
- $0 ~ /extern/
- but they can also be used to specify text for replacement, by using the “sub” and
- “gsub” functions. Looking ahead just a bit, these functions take a regular expression as
- the first argument, the string to use for replacement as the second argument, and the
- string to do the search and replace in as the third argument, with $0 used by default if
- there is no third argument. “sub” does a single substitution on the text, and “gsub”
- does all possible non-overlapping substitutions. Within the replacement strings of
- these functions, you can use \1 through \9 to refer to text currently matched by tagged
- subexpressions, and the ampersand “&” stands for all of the text that was matched.
- To put a plain ampersand in the replacement, use “\&”.
-
- At this point some considerable exampling usually helps:
- The quick brown matches just that, "The quick brown". Note it would match
- "The quick brown" in "The quick brownie".
- red fox\. matches "red fox." (the period must be escaped for a literal match).
- [ \t] matches a single space or tab ( that’s a space before the \).
- [ \t]+ matches any consecutive run of spaces and tabs in any mix.
- [0-9]+ matches an integer (read “one or more digits”)
- [+-]?[0-9]+ matches an integer, together with optional preceding sign.
- \<[A-Za-z'’-]+\> matches an English word.
- houses? matches "house" or "houses".
- m(iss)*ippi matches "mippi", "missippi", "mississippi", "missississippi", etc.
- ar*g matches "ag", "arg", "arrg", "arrrg", etc.
- MyFunction\( matches "MyFunction(".
- array\[index\] matches "array[index]".
- array\[.+\] matches "array[i]", "array[j]", "array[2*q-1]", etc.
- \\([0-7]|[0-7][0-7]) matches "\d" or "\dd" where d is an octal digit.
- ([^\\]?|(\\\\)+)" (horrors, be brave) matches an unescaped quote or a quote
- preceded by an even number of backslashes—in other words
- a true quote in C. The backslash is a metacharacter, so matching
- one literally requires a backslash before the backslash.
- The[ \t]+quick[ \t]+brown matches "The quick brown" with variable spaces and tabs
- between the words.
- \/\* matches the start of a C comment, "/*". The forward slash is
- escaped so that you can place the whole regular expression inside
- forward slashes. The escape before '/' would not be needed if you
- placed the expression inside quotes, but then you would need two
- escapes before the '*', ie "/\\*".
- \/\*.*\*\/ matches all of a one–line C comment, "/* - anything - */".
- ^Z matches a 'Z' at the beginning of a string.
- ^. matches the first character of a string.
- .$ matches the last character of a string.
- ^.*$ matches any string completely (not much use).
- ^A..$ matches any string which is three characters long, the first
- being an 'A'.
- ^(A|B).* matches all of any string that begins with 'A' or 'B'.
- ^[AB].* does likewise.
- (\w|_)\w* matches a C term, or integer constant.
- ((->)|(\.))(mem\b) matches “mem” when it is immediately preceded by “->”
- or “.”, and is not the beginning of a longer word. For
- replacement purposes in a “sub” or “gsub”, the part
- before “mem” is given by \1, and mem itself is \4.
- gsub(/((->)|(\.))(mem\b)/, "\1\4ber") will turn “->mem” into “->member”
- and “.mem” into “.member” everywhere in the current
- input line $0, ignoring things like “remember” or
- “->memories”.
- gsub(/\bFuncName([ \t]*\()/, "FunctionName\1") will replace “FuncName” by
- “FunctionName” everywhere in the current input line
- $0, provided it is followed on the same line by an opening
- parenthesis, with optional spaces or tabs between the name
- and “(”. The match extends from the “F” of
- “FuncName” up to and including the “(”, so the “(”
- and any intervening white space must be put back into
- the replacement string by tagging them in parentheses
- and using \1 after “FuncName” to refer to what was
- matched by the first set of parentheses in the pattern.
-
- This program prints all input lines containing one-line comments:
- /\/\*.*\*\// {print}
- (since {print} is the default action, it could be left out).
-
- Within a character class most metacharacters are taken literally. The exceptions are
- the escaping backslash \, the negating ^ (only at the beginning), and the range hyphen -
- (only between two characters). For example,
- [A-Za-z-] matches an English word, hyphens included
- [-A-Za-z] does the same
- [\-A-Za-z] also does the same (the '\' is unnecessary but harmless)
- ^[^^] matches any single character that is not a '^' at the beginning of a string
- [\^] matches a '^'.
- The toughest metacharacter to remember is the '^' which has three meanings: at the beginning
- of a character class it signals a negated character class; outside of a character class it matches
- the beginning of a string; and when escaped or not the first character in a character class it
- matches a literal '^'.
-
- Regular expressions are “left greedy”; where there could be more than one match in a
- string, a regular expression matches the leftmost one, and extends the match as far as
- possible. For the implications of this, see the discussion of the “match” operator in the
- “Built–in string and file functions” section of the next chapter, “Actions”.
-
- Now that we’re starting to get the hang of things, more examples using the replacement
- functions “sub” and “gsub” mentioned above. The format is sub(r,s,t) where r is a
- regular expression, s is the replacement string, and t is the string in which the search
- and replace is to be done. The contents of t before and after the sub are spelled out below.
-
- using t = "Don’t run that prune over, runt!":
- sub(/run/, "fly", t) turns t into "Don’t fly that prune over, runt!"
- gsub(/run/, "fly", t) turns t into "Don’t fly that pflye over, flyt!"
- gsub(/\brun\b/, "fly", t) turns t into "Don’t fly that prune over, runt!"
- gsub(/run/, "t&k", t) turns t into "Don’t trunk that ptrunke over, trunkt!"
- using t = "#define FOO 1":
- sub(/#define\W+(\w+)\W+([0-9]+)/, "int \1 = \2;",t) turns t into
- "int FOO = 1;" (\W+ means one or more non-word characters, \w+
- means one or more word characters, [0-9]+ means one or more digits;
- two groups are tagged).
-
- Three programs are supplied to help you do general–purpose listing of matches or
- search–and–replace; $MFSLister searches for either plain text or a regular expression
- with “Set variables” in the setup dialog, and lists file name/ line number of all
- single–line matches to stdout; $MFS_SuperLister does much the same, but finds
- matches that span a variable number of lines; and $MFS_SuperReplace does the
- ultimate search and replace, matching either plain text or full–blown regular
- expressions over a variable number of lines, handling any number of files at once,
- documenting the (post–change) locations of all changes to stdout. Heck, it even prints
- the fragments of original text before the changes, so that if you mess up you can at least
- (manually) undo the damage. (Exercise: write $MFS_Undo_SuperReplace).
-
- Compound patterns
- The logical operators ||, &&, and ! can be used to combine simple patterns into compound ones.
- These operators function the same as in C, specifically: || is the inclusive–or operator; && is
- the and operator; and ! is negation, with evaluation of a compound pattern proceeding only as
- far as necessary to determine whether the whole pattern is true or false.
-
- Some examples:
- $1 ~ /DATA/ && $2+0 > 0
- is true when the first field contains the string "DATA" and the second field is numeric and
- greater than zero. If the first field does not contain "DATA" then the second field is not checked.
- $1 == "DATA" || $1 == "INFO"
- is true when the first field is exactly equal to "DATA" or "INFO". The check for "INFO" is
- performed only if the check for "DATA" fails.
- $2 != 0 && !($3/$2 > 10 || $3/$2 < 1)
- first checks that $2 is not zero, to avoid dividing by zero, and then evaluates to true if
- $3 divided by $2 falls in the range 1 to 10.
-
- The ? : operator can be used to choose between two patterns, and is like the same
- operator in C. If the first pattern is true then the pattern used for testing is the second
- pattern, otherwise it is the third. Only one of the second and third patterns is evaluated.
-
- $2 != 0 ? $3/$2 > 1 : $3 == 0
- first checks to see if field 2 is non–zero; if so, the pattern is true if $3/$2 > 1; otherwise,
- the overall pattern is true if field 3 is also zero.
-
- Range patterns
- Range patterns consist of two patterns separated by a comma. Given
- pattern1, pattern2
- this evaluates to true for the first input line that matches pattern1, and thereafter is
- true up to and including the first line encountered that contains pattern2. Both patterns
- may occur on the same line, in which case the range pattern is true for just the one
- line (and a check for pattern1 begins again on the next line). If the second pattern is
- never seen, matching continues to the end of all input. Range patterns, as with BEGIN
- and END, cannot be compounded with other patterns to form more complicated patterns.
-
- Note that pattern2 specifies the last line to be matched, for example
- NR == 1, NR == 2
- matches the first and second lines of input.
-
- Range patterns are useful only with input that has been well–organised on a line–by–line
- basis, with clear signals for the start and end of a group of lines. An ideal case would be
- a file with markers dedicated to indicating the start and end of a group, such as
- Start 10 11 -23
- 47 101 96 End
- Start 19 23 End etc
- in which case your program could analyze groups with
- /Start/, /End/ {actions for the group}
- but in real life the only way you’ll see an input file like this is if you make it yourself.
-
- Summary of patterns
- A list of beasts in the pattern zoo (regex stands for regular expression, pat
- stands for pattern, str stands for string variable):
- Pattern Example
- ---------------- -------------------------------
- BEGIN BEGIN blocks are done before all input
- END END blocks are done after all input
- /regex/ /Mary( \t)+had/
- str ~ /regex/ (or !~) $1 ~ /(\-)?[0-9]+/
- str ~ "regex" (or !~) $1 ~ "(\\-)?[0-9]+"
- relational expression NF > 4
- pattern && pattern FNR == 1 && /File title:/
- pattern || pattern /Vermont/ || /Maine/
- pattern ? pattern : pattern $3 != 0 ? $2 / $3 > 25 : $2 < 0
- ( pattern ) - see next line
- ! pattern !($0 == "" || $0 ~/^The end$/)
- pattern1 , pattern2 FNR == 5, FNR == 8
-
- There’s no substitute for doing it yourself. Here’s a small program that will let
- you try out your own patterns—it’s not saved separately, so select it and save it
- into your “hAWK programs” folder under a name that begins with a '$', such as
- “$PatternTester”. Substitute your test pattern for the word “pattern” below
- when you have one to try out. Grab some example input from somewhere, paste it
- into a new window, call hAWK, select “$PatternTester”, and run it with the
- “All of front text” input option, leaving “Show stdout” with a check mark. All input
- lines that match your pattern will produce a comment in stdout, which will be shown
- to you after the run.
-
- #A small program for testing patterns.
- #Replace the word "pattern" on the next line with your pattern.
- pattern {
- print "Pattern matched input line", NR, "which was:"
- print "\t", $0
- ++n
- }
- END { if (n > 0)
- print "Total matches:", n;
- else
- print "No matches were found.";
- }#the end
-
- -------
- Actions
- -------
- Introduction
- Virtually everything you have learned about patterns can be carried over to actions for
- constructing conditional tests (excepting BEGIN, END, range patterns, and default
- behaviour when parts of a pattern are left out). For example,
- $1 ~ /NUM/ {if ($2 ~ /RANGE/)
- --then the first field contained "NUM", and the
- second field contained "RANGE"--
- }
- or
- FNR < 10 {if (FNR == 1)
- print "First line of current file is:", $0
- else if (FNR == 2)
- print "Second line of current file is:", $0
- etc
- }
- which demonstrate that it is possible to place a general test in the pattern, and then proceed
- with more specific tests in the action statements.
-
- You’ve probably noticed that hAWK expressions strongly resemble C code, and this is
- no accident—leaving aside the advanced machinery of C dealing with pointers, structs
- and unions, and multi–dimensional arrays, what you know about writing C carries over
- to hAWK. There are some omissions, such as no need to declare variables, no prototypes
- for functions, no brackets around the arguments of some built–in functions (print,
- getline) that require a bit of adjustment. And there are some additions (most notably
- regular expressions, built–in string functions such as “match”, and the way input is
- automatically retrieved to $0) which require a bit of work to grasp comfortably. But
- regular expressions were the only tough part; the rest is easy by comparison, and
- you should count your hAWK diploma as a foregone conclusion if you keep going here.
-
- You have met variables, including built–in and field variables, and the operators which
- are especially useful for building patterns: the sections below will round out the list of
- operators, describe hAWK’s built–in functions dealing with numbers and strings, and
- introduce control–flow statements (if, for, while, etc) which allow you to choose between
- alternatives or repeatedly excute statements.
-
- Knowledge of C will speed up learning hAWK. However, hAWK is simpler than C, so if you
- are new to C as well you should find that learning hAWK will speed up learning C. Whatever
- your background, you should regard hAWK itself as an essential part of this manual; if you
- have a small problem, or an idea that wants polishing, whip up a little hAWK program and
- give it a try.
-
- A preview of “print”
- Ultimately, your hAWK program will produce output. The “print” statement will answer
- most all of your output needs, being simpler in form than the “printf” function which has
- more sophisticated formatting. Pass “print” a list of variables or constants separated by
- commas, and they will be printed to stdout, with the commas replaced by the output field
- separator (the built–in variable OFS, by default a blank). The contents of ORS (the output
- record separator, by default a newline) will be appended to the end of what was printed.
-
- For example:
- this one–line program
- {print FNR, $0}
- will duplicate all input to stdout, adding a line number to the beginning of each line. The
- number will be reset to 1 at the beginning of each input file, but all input files will be
- concatenated together in stdout.
- {print $1}
- will print just the first field of each input line to stdout.
- $1 ~ /extern/ {print FILENAME, FNR}
- will print the (full path) file name and line number where the word “extern”
- was seen.
-
- Variables and strings may be concatenated together by using a space instead of a comma
- between them, for example
- a = "Sesqui"
- b = "alien"
- print a "ped" b
- which produces "Sesquipedalien" (note there is no built–in spelling checker). Concatenation
- is slower than using commas to separate the items for “print”, best used only if you must
- avoid having the OFS space between two items. Note that
- print a, "ped", b
- produces "Sesqui ped alien".
-
- More on “print” later, but for the time being if you find yourself wondering what an
- operator or function produces—assign the result to a variable and print it out.
-
- Expression operators
- With the exception of string concatenation and the matching operators, the operators in
- hAWK are the same as C operators. They apply to both numbers and strings wherever it
- is logical, and that numbers are floating point numbers. Note that if a variable is
- assigned an integer value then it can be treated as an integer—for example, if
- i = 1 at some point, then later the test
- if (i == 1) will evaluate to true (non-zero), with no failure due to obscure
- floating point rounding trouble.
-
- The operators in hAWK, in order of increasing precedence, are:
- --------------------------------------------
- = += -= *= /= %= ^=
- Assignment. Both absolute assignment ( var " = " value ) and operator-assignment (the
- other forms) are supported. “a += b” is equivalent to “a = a + b”.
-
- ?: The C conditional expression. This has the form
- expr1 " ? " expr2 " : " expr3
- If expr1 is true, the value of the expression is expr2 , otherwise it is expr3 . Only one
- of expr2 and expr3 is evaluated.
-
- || logical OR. In “a || b” if a is true then b is not evaluated.
-
- && logical AND. In “a && b” if a is false then b is not evaluated.
-
- ~ !~ regular expression match, negated match. See “String-matching patterns”.
-
- < <= > >= != ==
- the regular relational operators. Note especially that strings can be
- compared, eg if ($3 == "cat"). In “a <= b” or the like, if both
- arguments are numbers the comparison is done numerically,
- otherwise they are compared as ASCII strings.
-
- blank string concatenation; if a = "John" and b = "Henry" then
- c = a b; produces c = "JohnHenry".
-
- + - addition and subtraction.
-
- * / % multiplication, division, and modulus ( x%y produces the remainder of
- x divided by y, equivalent to x - int(x/y)*y ).
-
- + - ! unary plus, unary minus, and logical negation.
-
- ^ exponentiation.
-
- ++ -- increment and decrement, both prefix and postfix.
-
- $ field reference. $0 is the entire current record, $1 the first field,
- and $NF the last field. Fields may be changed or added.
-
- Some examples:
- {lines[++n} = $0}
- accumulates all input lines to the array lines[]. The variable “n” starts out as 0, so
- the “++n” produces 1 as the first index. At the end of input “n” is equal to the number
- of input lines seen, so
- END {print lines[1]; print lines[n]}
- would print out the first and last lines of input.
-
-
- Built–in numeric functions
- hAWK has the following pre-defined arithmetic functions, with x and y as
- arbitrary expressions:
- atan2( y , x ) returns the arctangent of y/x in radians.
- cos( x ) returns the cosine of x in radians.
- exp( x ) the exponential function "e to the x"
- int( x ) truncates to integer (eg int(7.325) gives 7); to round,
- use int(x + .5).
- log( x ) the natural logarithm function, base e. For log base 10, use
- log(x)/log(10).
- rand() returns a random number, 0 <= rand() < 1.
- sin( x ) returns the sine of x in radians.
- sqrt( x ) the square root function.
- srand( x ) use x as a new seed for the random number generator. If no
- x is provided, the time of day will be used. The return value
- is the previous seed for the random number generator.
-
- Some examples:
- atan2(0,-1) gives π, and exp(1) gives e.
-
- theta = atan2(y,x)
- r = sqrt(x*x + y*y)
- converts rectangular x,y to polar r,theta.
-
- int(max * rand())
- produces a random integer from 0 to max-1, inclusive.
-
- Built–in string and file functions
- There is only one string operator, the concatenation operator, invoked when two variables
- or constants are separated by a space. Other useful string manuipulations in hAWK are
- carried out by built–in functions. In the following table, r is a regular expression,
- s and t are strings, the a and b are arrays, and i and n are integers.
-
- gsub(r, s, t) for each substring matching the regular expression r in
- the string t , substitutes the string s , and returns the
- number of substitutions. If t is not supplied, uses $0 .
- index( s , t ) returns the index of the string t in the string s,
- or 0 if t is not present.
- length( s ) returns the length of the string s .
- match( s , r ) returns the position in s where the regular expression r
- occurs, or 0 if r is not present, and sets the values of
- RSTART and RLENGTH .
- split(s, a, r) splits the string s into the array a on the regular
- expression r , and returns the number of fields. If r is
- omitted, FS is used instead.
- sprintf( fmt , expr-list ) prints expr-list according to fmt , and returns the
- resulting string. See the discussion of “printf” for details.
- sub(r, s,t) this is just like gsub , but only the leftmost matching
- substring is replaced. Returns number of substitutions.
- substr(s, i, n) returns the n-character substring of s starting at i . If n
- is omitted, the rest of s is used.
- tolower( s ) returns a copy of the string s , with all the uppercase
- characters in s translated to their corresponding
- lowercase counterparts. Non-alphabetic characters are
- left unchanged.
- toupper( s ) returns a copy of the string s , with all the lowercase
- characters in s translated to their corresponding
- uppercase counterparts. Non-alphabetic characters are
- left unchanged.
- lookup( s ) returns integer–coded C type of s (s should be a word).
- (At present this function is supported by: EnterAct.
- Types are taken from whatever project is open at the
- time.) See “$LookupTest” or “$XRef” for an example.
- Type integer returned
- ---- ------------
- defined constant or macro 1
- file–scope variable 2
- function 4
- enum constant 8
- typedef 16
- struct tag 32
- union tag 64
- enum tag 128
- other 0
- sort(a,b,s) produces an index in the array “b” that can be used to access
- the elements of “a” in sorted order. The string “s” specifies the
- kind of sort; "a" for ASCII, "n" for numeric, "d" for dictionary
- order, and "ra", "rn", "rd" for reverse of the same. Returns the
- number of elements in the array “b”, which is indexed numerically
- from 1 upwards. The elements of “b” are the indexes of “a” in
- sorted order provided “b” is accessed in the sequence b[1], b[2],
- b[3] etc. Typical use is
- maxIndex = sort(a, b, "d")
- for (i = 1; i <= maxIndex; ++i)
- print a[b[i]]
- which will print the elements of a in sorted dictionary order.
- See “$WordFrequency” and “$XRef_Full” for examples, and
- “$SortTest_Nums” for a simple numeric example.
- time( ) returns the current time, eg "Sunday, October 27, 1991 09:03:30 AM"
- —note this is the time when the function is called, down to the second,
- whereas the TIME variable holds the time at which your program run
- starts, down to the minute. See “$TIME” for an example.
- prompt( s ) displays an OK/Cancel dialog. The string “s” appears at the top of the
- dialog, and you can type in a string in an edit field. Returns what you
- type in, as though it was a string constant. Both the string “s” and what
- you type in are limited to 255 characters. For an example of usage
- see “$PromptTest” and “$YoungMath”. Typical use is
- x = prompt("Enter the number of lines to print:")
- if (x+0 > 0) {
- while (getline lne > 0 && ++i <= x) print lne }
- If you cancel the dialog or hit <Return.> without typing in any text,
- prompt returns the null string "".
- NOTE this function is only useful if hAWK is called up in the “immedate”
- mode (typically hold down the <Shift> key when selecting “hAWK”). In
- “concurrent” mode, “prompt()” does nothing but return the empty
- string "" without displaying a dialog.
- progress(s) displays the string “s” in a dialog on your screen (the message stays
- on the screen). You can change the message with another “progress”
- call. “progress” returns the number of times it has been called, and
- the dialog goes away by itself at the end of your program run. For a
- test sample, see “$ProgressTest”.
- NOTE this function is only useful if hAWK is called up in the “immedate”
- mode (typically hold down the <Shift> key when selecting “hAWK”). In
- “concurrent” mode, “progress()” does nothing but return 0.
- --and added for hAWK version 2 (mainly file functions):
- Note in the functions below where a file or directory name is required it must
- be a full pathname, of the form “disk:folder1:folder2:...:folderN:filename”
- for a file, or “disk:folder1:...:folderN” or “disk:folder1:..:folderN:”
- for a directory (the second version has a colon at the end). For a disk name,
- use “disk:” rather than “disk”.
- beep( n ) does a SysBeep(n); if the duration "n" is <= 0, the menu bar will
- flash instead. Durations of 0,1,2,5 work best.
- copy( s, t ) copies the file named “s” to the file named “t”. Both file names
- must be full pathnames (disk:folder:...folder:filename). Either
- the location or name or both can be changed. If file “t” already
- exists, it must be closed and unlocked. Both creator and type are
- preserved, and the resource fork is copied as well as the data
- fork. Any kind of file can be copied. To move or rename a file, use
- if (copy(s,t)) remove(s)
- (this is an efficient way to move a file, but there is a separate
- rename() function). NOTE that t's folders will be created if needed.
- Returns 1 if successful, 0 if the copy could not be done.
- exists( s ) returns 1 if the file named “s” exists, 0 if it does not. Any kind
- of file can be tested.
- fdate( s ) returns date/time of last modification of file named “s”, format
- “yr:mo:day:hr:min:sec” where yr is 4 digits, and the rest are 2
- (eg always 01 rather than just 1). The length of the string is
- always 19 (or 0 if no date could be extracted) and the colons
- and digits always occupy the same positions.
- fsize( s ) returns size in bytes of the data fork only of the file named “s”
- getclip( n ) returns the calling application’s current clipboard text, up to
- a maximum of the first “n” bytes. Use n = 0 or omit it entirely
- if you want the entire clipboard. For example, if the current
- clip is “Some text here” then getclip(6) returns “Some t”
- whereas getclip(0) or getclip() returns the entire clip. At
- present this function is supported by: EnterAct.
- putclip( s ) replaces the calling application’s (private) clipboard with
- the string “s”. Note that other applications won’t see the
- change until you switch out of the calling app. The length
- of s is limited to 32,767 characters (as are all hAWK strings).
- See the “$Clip...” functions in the “hAWK programs” folder
- for examples using getclip/putclip. Supported by: EnterAct.
- list( s, a ) given file or directory full pathname in “s”, produces list of
- full pathnames for all TEXT files in the directory (either the
- directory named or the directory holding the file), as elements
- indexed 1,2,3... in the array “a”. Note subdirectories are also
- excluded. Returns the number of files in the list.
- nested( s, a ) given a file full pathname in “s”, generates list of full pathnames
- for directories at the same level ("sibling folders"); given directory
- name, generates list of subdirectories at the top level in the named
- directory (“offspring folders”). The list is returned as elements
- indexed 1,2,3... in the array “a”. In other words, the same as
- “list” but for folders rather than TEXT files. Note neither “list”
- nor “nested” look beneath the top level of the folder in question.
- Returns the number of directories in the list.
- remove( s ) deletes the file named “s”, provided it is closed and unlocked. Use
- with caution, this is not undoable unless you get lucky using your
- favourite file recovery tool. Returns 1 if the file was deleted,
- 0 otherwise. Use with caution!
- rename( s, t ) takes the file with full pathname “s”, and renames it “t”. The
- new name “t” can be a full pathname, or just the new file name
- proper, as in
- rename("Disk:dir1:aardvark", "Disk:dir1:fruitbat")
- or equivalently
- rename("Disk:dir1:aardvark", "fruitbat")
- This function works only with files, not directories or volumes,
- returning 1 if the rename was carried out, 0 if not.
-
- The version 1 functions form the heart of hAWK, and you will find examples of usage of
- one or more of these in nearly all the sample programs. The version 2 functions have
- more limited scope, but keep them in mind when you need to wrestle with files.
-
- Within the replacement string 's' of gsub(r,s,t) and sub(r,s,t), a '&' is taken to stand
- for the entire string of text that was matched by the regular expression 'r'. For example,
- gsub(/cat/, "&s", t) with t = "cat and dogs" produces t = "cats and dogs" after
- the substitution. Use “\&” if you want a literal '&' in the replacement string.
-
- Using sub, gsub, and match effectively is entirely a matter of becoming comfortable
- with regular expressions (practice makes perfect). The regular expressions in these
- functions can be static, as in
- if (match($0, /struct/))...
- or dynamic (the contents of a variable) as in
- wordStart = "^|[^a-zA-Z'-]"#beginning of string or non–word character
- optLetters = "[a-zA-Z'-]*"#zero or more word characters
- findString = wordStart "(A|a)ct" optLetters
- if (match($0, findString))...
- (which matches eg “act”, “Actor” but not “tract”, or “Reactor”). It’s sometimes
- handy to use the “Set variables” dialog to set the string to be found (see $MFSLister,
- for example), or you can even read the string to be found out of the input itself, as in
- FNR == 1{find = $1; rep = $2}
- FNR > 1{gsub(find, rep)}
- which sets the strings for find and replace from the first two fields on the first line
- of input, and then uses them to do replacement on all subsequent lines.
-
- A miscellany:
-
- {gsub(/->resourceid/, "->resourceID")
- gsub(/\.resourceid/, ".resourceID")
- }
- copies all input to stdout, changing “resourceid” to “resourceID” when it appears
- as a member name (note $0 is used in the gsub by default).
-
- gsub("\n", "\n", multi)
- returns a count of the number of returns (newlines) in the string “multi”.
-
- gsub(/boo/, "&&s") turns “boo” into “booboos” everywhere in $0.
-
- index("abcdef", "cd") returns 3.
-
- match("abcdef", /cd/) returns 3, and sets RSTART to 3, RLENGTH to 2.
-
- z = split("hour:minute:second", arr, ":") assigns 3 to z, with
- arr[1] = "hour", arr[2] = "minute", arr[3] = "second".
-
- Given str = "Now is the time",
- substr(str,1,3) returns "Now", substr(str,8) returns "the time".
-
- More examples follow the next section.
-
- Control-flow statements
- Statements in hAWK may be grouped with curly braces, one can execute statements only
- when a certain condition is met, and statements can be repeatedly executed according to
- the value of some condition. While hAWK does not have a “goto”, it does allow you to
- jump back to the top of your pattern–action statements with “next”, or jump to your
- END statements on the way out the door with “exit”.
-
- In the following list of control statments, any instance of “statement” can be replaced
- by a group of statements enclosed in curly braces {}:
-
- { statements }
- Simple grouping of several statements together, so that conditional or repeated
- execution can be applied to the group.
- if (condition) statement1 [ else statement1 ]
- If the condition evaluates to true then statement1 is carried out; the “else”
- clause is optional, and its statements will be executed if the condition is false.
- while (condition) statement
- The condition is first evaluated, and if it is false then the statement is skipped. If
- it is true then the statement is executed; the condition is again evaluated, and the
- statements again executed if the condition is true, and this process continues until
- the condition is false. Note that if the condition is false the first time then the statement
- will not be executed at all. “while” loops are affected by break and continue statements.
- do statement while (condition)
- The statement is always executed at least once; then the condition is evaluated, and if it
- is true then the statement is excuted again. This process continues until the condition
- is false. Unlike the “while” loop, the “do” loop always executes its statement at least
- once.
- for (expr1; expr2; expr3) statement
- eg “for (i = 1; i <= 6; ++i) {print i}”
- Mnemonically, “for it’s (a jolly good fellow)” helps: in “it’s”, the “i” stands for
- initialization, the “t” for “test”, and the “s” for “step”. expr1 is the initialization,
- executed only once, just before the “for” loop proper is entered. Next
- expr2, the test, is evaluated, and if it is true then the statement is executed, otherwise
- the for loop ends and control passes to the next statement beyond it. If the statement is
- executed then expr3, the step, is carried out, and then it’s back to the top of the loop
- —no more initialization, but the sequence test, execute, step, continues until the test
- produces false.
- for (var in array) statement
- Indexes for the array are retrieved one–by–one to the variable “var”, though not
- in a readily predictable order, and the statement is executed for each index.
- break
- For use only among the statements that make up the body of a while, do, or for loop.
- Usually found in the form “if (condition) break;”, when the break is executed then
- control immediately passes to the next statement after the loop.
- continue
- Also for use only in a while, do, or for loop, and also usually executed only when
- the condition of some if–statement is true. When encountered, control passes to the
- very end of the statements making up the body of the loop, and the next iteration of
- the loop begins.
- next
- Stop processing the current input record. The next input record is read and
- processing starts over with the first pattern in the hAWK program. If the end of
- the input data is reached, the END block(s), if any, are executed.
- exit [ expression ]
- In an END action, exit truly causes the hAWK program to terminate. Anywhere
- else, the exit statement causes the program to jump to the END actions, and only
- if none are present does the program immediately terminate. The “expression”
- is provided for compatiblilty with standard AWK programs, and won’t be of any
- use to you.
-
- Here’s a small sample program, with lots of potential if you’re looking for
- a first hAWK project:
- BEGIN { find = "(^|[^@])([A-Z][A-Z]+)" #note \1 \2 grouping by ()()
- rep["CA"] = "California"
- rep["HYPO"] = "hypobetalipoproteinemia"
- rep["RE"] = "regular expression"
- #...etc... note just a part of a word is OK
- }
- {loopCount = 0;
- while (match($0, find) && loopCount++ < 50)
- {
- acronym= substr($0, RSTART, RLENGTH)
- gsub(/[^A-Z@#]/, "", acronym) #or sub(find, "\2", acronym)
- if (acronym in rep)
- sub(find, "\1" rep[acronym])#replace acronym by expansion
- else
- sub(find, "\1@#@\2")#stick '@#@' in front of unknown acronym
- }
- if (loopCount >= 50)
- {
- print "The acronym", acronym, "is looping forever." ; exit
- }
- gsub(/@#@/, "")#trim the protector by replacing it with null string
- print #print the altered line to stdout
- }
- - builds a glossary at the beginning, and then expands any acronyms in the input for
- which there is an entry in the array “rep”, sending the expanded version to stdout.
- The “sub” and “match” both match the leftmost longest string of uppercase letters,
- and replacement is done one match at a time until the line contains no more matches.
- To avoid an endless loop, finds for which there is no expansion have a '@#@' stuck in
- front of them. This '@#@' is trimmed away after.
-
- A silly example:
- #print arr[] elements with index, according to value of “sequence” string:
- #use as much variety as possible, to avoid boredom. If sequence is numeric,
- #“arrMax” holds the maximum index.
- if (sequence == "up")#Numeric increasing index
- {
- i = 1;
- do
- {
- print i, arr[i++]
- } while (i <= arrMax);
- }
- else if (sequence == "down")#Numeric decreasing index
- {
- i = arrMax;
- while (i >= 1)
- {
- print i, arr[i]
- --i
- }
- }
- else if (sequence == "associative")#Arbitrary indexes
- {
- for (i in arr)
- {
- print i, arr[i]
- }
- }
- else
- {
- print sequence, "???!!!!"
- print "Repeat after me, ten times:"
- for (i = 1; i <= 10; ++i)
- print "I will proofread my programs."
- exit
- }
-
-
- Virtually all of the sample programs in the “hAWK programs” folder illustrate
- control–flow statements.
-
- Empty statements
- The empty statement, which does nothing at all, is denoted by a semicolon. Loops
- require a body of some sort, and if you wish no statements to be executed in the
- body of the loop then just use a single semicolon for the body. More rarely, an
- empty statement is useful as the statement for an “if” statement.
-
- ------------------
- User-defined functions
- ------------------
- Functions in hAWK take the form:
- "function" name(parameter1, parameter2,... local1, local2...)
- {
- statements
- }
- They are executed when called from within an action statement (or as part of a pattern).
-
- hAWK function definitions begin with the keyword “function”, and no return type is
- declared, though a value may optionally be returned. Local variables are listed after the
- parameters for the function, more to simplify the grammar of the language than
- anything else. Scalar parameters are passed by value (ie a local copy is made for the
- function, and the original variable in the function call is not touched by the function)
- whereas array parameters are passed by reference (the parameter array name refers
- to the same array that is provided as the argument). Function definitions must be placed
- at the top level of your program outside any pattern–action blocks, and you generally end
- up with a readable program if you put all of your function definitions at the end of your
- program.
-
- Here’s a typical function:
- function Swap(a, i, j temp)
- {
- temp = a[i]
- a[i] = a[j]
- a[j] = temp
- }
- When called, it appears for example as
- arr[1] = 7; arr[4] = 9; Swap(arr, 1, 4)
- which results in arr[1] = 9, arr[4] = 7. Note that the “temp” variable is intended for
- use only within the Swap function, and is a local variable rather than a parameter of
- the function.
-
- Local variables are initialized to 0 and "" each time the function is called. No space should
- be put between the function name and the '(' of the argument list when calling one of
- your own functions, to avoid invoking the simple–minded concatenation operator.
-
- Functions may return an expression, as in
- function SumArraySquared(a, sum)
- {
- for (i in a) #unlike C, array size need not be known separately
- sum += a[i]#note sum is local, automatically inited to zero
- return sum*sum
- }
- or
- function StringUpTo(str, upto)
- {
- return substr(str, 1, index(str, upto) - 1)
- }
- (eg StringUpTo("This is: a test", ":") would return "This is").
-
- Some details about functions:
- Newlines are optional after the left curly brace of the function body and before the
- closing left brace.
- Functions may call each other and may be recursive.
- The word func may be used in place of function. For tired typers only.
-
- -------
- Output
- -------
- The “print” statement
- “print” sends simply–formatted strings to a file, stdout by default. The expressions
- supplied to the print statement are separated from one another by commas, and may
- also be entirely surrounded by parentheses. The variations are
- print
- print expression1, expression2, ..., expressionN
- print (expression1, expression2, ..., expressionN)
- A “print” with no expressions is an abbreviation for
- print $0
- Each expression is converted to a string and printed in turn, with each comma being
- replaced by the built–in variable OFS, by default a single blank. Each print statement
- is terminated with the built–in ORS, by default a newline.
-
- The parenthesized version of “print” is necessary if relational operators are present
- in the expressions, since the '>' operator can mean “greater than” or “redirect output
- to the file...”—see “Output into files” below.
-
- The print statement is used in virtually every sample program provided, and the
- more–sophisticated “printf” is seldom seen since fancy formatting is not often needed.
-
- Some common print statements are
- print "" #prints just a blank line
- print names[z], FNR #documents location of something by printing file name and line
- (search this file from the top for “names[z]” if you missed it)
-
- The “printf” statement
- This function also has a parenthesized and unparenthesized form,
- printf format, expression1, expression2, ..., expressionN
- printf(format, expression1, expression2, ..., expressionN)
- and, as with “print”, the parentheses are needed only if a relational operator
- is contained in one of the expressions. The “format” argument is interpreted
- as a string, and may contain either literal text to be printed or format
- specifications for strings or numbers to be printed. Format specs are indicated
- in the format string by a '%', and there should be one expression following the
- format for each format specification—eg if you specify that a string, a number,
- and a string be printed, then you list the string, number, and string after the
- format, in the same order, separated by commas.
-
- The hAWK versions of the printf and sprintf functions accept the following
- conversion specification formats, entirely borrowed from C:
- %c an ASCII character. If the argument used for %c is numeric, it is treated as
- a character and printed. Otherwise, the argument is assumed to be a string,
- and the only first character of that string is printed.
- %d a decimal number (the integer part).
- %i just like %d .
- %e a floating point number of the form [-]d.ddddddE[+-]dd .
- %f a floating point number of the form [-]ddd.dddddd .
- %g use e or f conversion, whichever is shorter, with nonsignificant zeros
- suppressed.
- %o an unsigned octal number (again, an integer).
- %s a character string.
- %x an unsigned hexadecimal number (an integer).
- %X like %x , but using ABCDEF instead of abcdef .
- %% a single % character; no argument is converted.
-
- There are optional, additional parameters that may lie between the % and the control
- letter (also from C):
- - the expression should be left justified within its field (note if the '-'
- is absent then the expression is right justified)
- width the field should be padded to this width. If the number has a leading
- zero, then the field will be padded with zeros. Otherwise it is padded
- with blanks.
- . prec a number indicating the maximum width of strings or digits to the right
- of the decimal point.
- For example, %-23.14s prints strings in a field 23 characters wide, left justified,
- printing at most 14 characters from the string. And %8.4f will print a floating point
- number in a field 8 characters wide, right justified, with 4 digits to the right of the
- decimal point.
-
- The dynamic width and prec capabilities of the C library printf routines are not
- supported. However, they may be simulated by using the hAWK concatenation operation
- to build up a format specification dynamically.
-
- Some examples:
- “print var” always appends the value of ORS (by default a newline); to avoid this, use
- printf("%s ", var)
- and when a newline is needed, supply one yourself with something like
- print "" or printf("%s\n", var).
-
- Given strings of variable width in fields $1 and $2, reformat to print these strings
- right–justified in two nicely–lined–up columns:
- { one[++n] = $1
- two[n] = $2
- if (w1 < length($1))
- w1 = length($1)
- if (w2 < length($2))
- w2 = length($2)
-
- }
- END {w1 += 2; w2 += 2;#a couple of spaces between columns
- for (i = 1; i <= n; ++i)
- printf "%" w1 "s" "%" w2 "s\n", one[i], two[i]
- }
- —this illustrates using the hAWK concatenation operation “to build up a format
- specification dynamically”; for example, if w1 = 9 and w2 = 15 (after adding 2) then
- we get
- printf "%9s%15s\n", one[i], two[i]
- as the effective printf statement.
-
- Output into files
- By default, “print” and “printf” send all of their output to stdout. However, the
- redirection operators '>' and '>>' allow you to send output to any text file.
- Redirecting output takes one of the forms
- print expression–list > outfile
- print(expression–list) > outfile
- printf format, expression–list > outfile
- printf(format, expression–list) > outfile
- print > outfile
- or any of those with '>>' instead of '>'. The '>' operator will erase the contents of outfile
- before beginning to write to it, whereas '>>' will append what is being printed to outfile
- without clearing the file first. Both operators open the file “outfile” the first time it
- is encountered in the program, and keep it open. The file will be closed for you at the end
- of your program, but if you have many files to write to you should close each output file
- yourself when you are done with it, with “close(outfile)”.
-
- hAWK deals with full path names only, and the names of all output files must be full path
- names if you want the file to end up in a predictable place. Since hAWK is adept at
- manipulating strings, and a file name is just a string, you can manufacture file names
- and paths within your program to fit most needs. The built–in variable STDPATH contains
- the path leading to your stdout file, so concatenating a file name to the end of STDPATH, as
- in
- outfile = STDPATH "Search Results"
- will allow you to write files to the folder containing your stdout file, which is your
- THINK C/Drag_on Modules folder if you followed installation suggestions. The simplest way
- to concoct the appropriate path name for an arbitrary location on your hard disk(s) is
- to run the hAWK program “$EchoFullPathNames”, choosing a text file in the desired
- location as the input for the program. This will give you the explicit full path name, eg
- Disk:C Projects:Banana INIT:Banana source:In_your_ear.c
- from which you can copy the path to use as prefix for output file names, in this case
- Disk:C Projects:Banana INIT:Banana source:
- (neglect not that last colon!)
-
- As special cases you can use the names "stderr" and "stdout" to redirect output
- to your stderr and stdout files, eg
- print "Serious interstitial vacuities have been detected" > "stderr"
- which will quietly write the message to your stderr file—you won’t be notified
- that anything has been written there. Normally there isn’t much use for redirecting
- output to "stdout" since it goes there anyway by default.
-
- If your current input file happens to be in the right location for the output you intend to
- write (for example, if the output is to be an altered version of the input, saved under a
- different name) you can extract the path part of the input name, and tack it on to the
- beginning of your output file name to produce the needed full path name with this:
- BEGIN {outfile = "Results"}#a fixed name for this little example
- FNR == 1{#at the first line of the current input file
- z = split(FILENAME, names, ":");#fragment the full path into the array “names”
- for (i = z-1; i >= 1; --i) #note i = z gives the input file name proper
- outfile = names[i] ":" outfile;#put path in front of outfile name
- }
-
- Can you tell what this program does?
- FNR == 1{z = split(FILENAME, names, ":");
- outfile = names[z];
- if (match(outfile, /[0-9]+\.[cChH]$/) > 0)
- {#file name ends in number.c or the like
- versNumber = substr(outfile, RSTART, RLENGTH - 2);#just the number
- ++versNumber;
- versNumber = versNumber ".c";
- sub(/[0-9]+\.c$/, versNumber, outfile);
- }
- else
- {
- print FILENAME, "does not end in number dot c or h, quitting early"
- exit
- }
- for (i = z-1; i >= 1; --i)
- outfile = names[i] ":" outfile
- }
- {print > outfile}
- —among other things, it fills up your disk pretty quick. (See $TabsToSpaces.)
-
- Closing files
- To close a file named by expr, use
- close(expr)
- This could be a fairly explicit name, such as
- close (STDPATH "Results")
- where concatenation is used to create the full name, or it could be simple
- close(outfile)
- where outfile holds the string that is the full path name for the file being closed.
-
- If you write to a file, then you must close it before subsequently reading from it. More
- importantly, there is a limit on the number of files that can be open at once, so if your
- program writes to a large or arbitrary number of files it is good policy to close each file
- when it is completed. As you will see just below, it is also possible to take input from
- an arbitrary file by means of redirection with the “getline” function, and in this case
- as well it pays to close a file when you are done with it.
-
- ------
- Input
- ------
- FS, the input field separator
- If you leave FS set to its default value of a single space, then any combination of
- blanks and tabs will count as the field separator, and as a “bonus” any leading
- blanks or tabs will be removed from the first field of each record, though they will
- remain in the record itself (ie $1 is trimmed but $0 is not).
-
- FS is slightly odd in that it has two modes of interpretation; when it is a single character
- such as FS = ":" then the single literal character (no matter what it is) is taken as the
- input field separator, but if the string for FS is longer than a single character it is
- interpreted as a regular expression. Here are some commonly–used field separators:
- FS = "[ ]" —necessary if you wish the field separator set to a single space, since
- FS = " " invokes the default behaviour described above
- FS = "[ ,\t]+" —any mix of blanks, commas, and tabs
- FS = "\n" —a field is a complete line (see the discussion in the next section).
-
- RS, the input record separator
- In practise RS is either left to its default value of "\n" (ie a record is the same as a line)
- or can if needed be set to the null string "", in which case records are separated by one
- or more blank lines. The latter corresponds to a simple form of database, with all the
- lines of each record grouped together and blank lines between records. With these
- multi–line records it is often useful to also set the field separator FS to "\n", so that
- a field becomes a complete line.
-
- Alas, these simple conceptions of a record are not often adequate. Narrative text and C
- source files require a more flexible approach to input which can be generally stated as
- “grab enough input to do the current job, and never mind where the lines end”. Several
- solutions are discussed in the “Beyond input records” section of “Advanced
- topics”—don’ t skip over the next section on “getline”, though, because it plays a
- strong supporting role.
-
- The “getline” function
- “getline” is a built–in function that allows you to retrieve input records from the current
- input file or from any other file. As you know, the default behaviour of a hAWK program is
- to retrieve input from your input files one record at a time, marching through the records
- and files from beginning to end. Often, however, one needs to read in a group of lines until
- some condition is met, or interrupt regular input to retrieve records from some other file,
- and these are the special capabilities that “getline” provides. It can be used in the following
- ways:
- getline sets $0 from next input record; sets NF, NR, FNR .
- getline < file sets $0 from next record of file; sets NF .
- getline var sets var from next input record; sets NR, FNR .
- getline var < file sets var from next record of file .
- and in all cases “getline” returns 1 if a record was successfully retrieved, 0 if the end of file
- was encountered, and -1 if some problem occurred, such as failure to find the file.
-
- The effect of “getline” by itself is to dump the current string in $0 and replace it with
- the next input record, setting all the usual built–in variables. Program execution then
- continues with the statement following “getline”. By comparison, the “next” statement
- does everything that “getline” by itself does, but in addition processing starts over
- with the first pattern in your hAWK program.
-
- If a variable name is present immediately after “getline”, then the input record is
- retrieved to the variable instead of to $0. The '<' symbol is the input redirection
- operator meaning “get input from the file...”, and is followed by the name of the input
- file to use. Note that file names must be full path names, as is always the case in hAWK.
-
- Some examples:
- $MFS_SuperLister uses a buffer holding a variable number of lines, to match regular
- expressions that can span more than one line. The heart of this program is the action
- {multi = $0;#the first line is already there
- while (getline x > 0)#== 0 at end of file, < 0 for error
- {
- multi = multi "\n" x;
- ...
- }
- }
- which employs a “getline” to retrieve the contents of the current input file from the
- second line to the end of the file (the first line is already present in $0). This program
- is discussed further in the “Beyond input records” section of “Advanced topics”.
-
- $FilesInOrderTest illustrates the technique of reading in a list of input files, then setting
- up the built–in variables so that those files will be used as input for a program. In other
- words, the program receives a single input file which lists the actual input files to use;
- this file is read at the start of the program, and used to set up the built–in array ARGV[]
- so that the program will be “fooled” into taking input from the specified list of files.
- The list of files is read in at the beginning with
- BEGIN {while (getline _specific_file_ < ARGV[1] > 0)
- {
- if (length(_specific_file_) > 1 &&
- index(_specific_file_, ":") > 0)
- ARGV[ARGC++] = _specific_file_;
- }
- close(ARGV[1]);
- ARGV[1] = "";
- }
- which reads in the full path names for the input files (one name per line) from the
- first input file (ARGV[1]) into the variable “_specific_file_”. This program is
- discussed further in the “Other ways of specifying input files” section of “Advanced
- topics”.
-
- ----------------
- The “hAWK” function
- ----------------
- hAWK ( arr ) : executes the hAWK program specified by the array "arr", returns
- the “recursive depth” at which the call was executed. The array holds the command–line
- arguments to be passed to the new program, indexed 0,1,2.... The hAWK() function is a
- recursive call to hAWK itself, with all built–in variables reset to their initial values.
- “hAWK” can be called anywhere a function can be called (ie in an action or function, but
- not a pattern). It’s just like calling hAWK from the menu, but you don’t get a dialog
- so all arguments must be explicitly supplied. If the discussion below of what to put in
- "arr" seems a bit brief, see also “The command line and ARGV[]”.
-
- Each call to hAWK() does chew up some memory which is not freed until
- all hAWK programs terminate, so there is some finite limit on the number of times that
- hAWK() can be called. In addition, memory that your program allocates by creating
- arrays is not automatically freed, so if the program called by hAWK() is not the last
- thing that will be done then large arrays should be “emptied out” with something like
- for (w in array)
- delete array[w]
- —this memory will then be available for other programs.
-
- While hAWK() can be used to sequentially execute several small programs
- (see $Chain), more typically it is used to execute just one program—a program
- which is specially created by the calling program to do just the task required.
-
- The primary advantage offered by calling another program from within a program
- is that you can select, or even create, the program to be run after doing some
- preliminary analysis (reading a file or looking at the preset variables), and the
- program which is eventually run will be faster than a more general–purpose one.
-
- $MFS_SuperReplace for example creates a special search–and–replace program
- to do the s&r you specify with your “find” and “replace” variables, in which the
- regular expression to search for is an explicit string rather than the content of
- a variable (ditto the replace string). The advantage is that an explicit regular
- expression is analyzed only once at the start of a program, whereas a variable
- (dynamic) regular expression is re–analyzed every time it is used, even if its
- contents don’t change. The special–purpose program takes a moment to get going,
- but then runs noticeably faster than a general–purpose search–and–replace program
- which uses variables.
-
- The general incantation to follow for creating the command–line array "arr" is:
- if (notFirstCall) #needed only if making more than one hAWK() call
- {
- x = 0; #arr[] is indexed 0 up - reset to 0 if making more than one call
- for (w in arr)
- delete arr[w]; #Avoid passing spurious arguments from last hAWK() call
- }
- arr[x++] = "hAWK"; #The command name in arr[0], anything you like, really.
- arr[x++] = "-f" programName; #Full path name, eg
- #progName = STDPATH "Drag_on Modules:hAWK programs:" "Type&Run program"
- arr[x++] = "-f" FirstLibrary; #Full path name. The "-f" indicates a program name
- ...
- arr[x++] = "-f" LastLibrary;
- arr[x++] = "-v" "firstVar=" someVarfirst #Preset variables. "-v" indicates a variable
- arr[x++] = "-v" "secondVar=73"; #Value can be hard-set too
- ...
- arr[x++] = "-v" "lastVar=" lastVar
- arr[x++] = "--" #Signals only input files, if anything, follow
- arr[x++] = FirstInputFile #Full path name
- ...
- arr[x++] = LastInputFile
- notFirstCall = 1; #Needed only if making more than one call to hAWK()
- depth = hAWK(arr); #invoke the program; returned value can be ignored.
- If you wish to pass all input files along to the program being called, use
- for (j = 1; j < ARGC; ++j)
- arr[x++] = ARGV[j]
- If you wish to use stdout as the input, use
- arr[x++] = STDPATH "$tempStdOut"
- For some real examples, see $Chain, $Type&Run, $RunClip, and $MFS_SuperReplace.
-
- Note that no argument count “argc” needs to be passed to the hAWK() call; internally,
- the end of arguments is detected by looking for 10 consecutive null arguments (eg if
- arr[8] is non-null and arr[9] through [18] = "", then arr[8] is taken as the last
- real argument).
-
- A small bonus; when calling a hAWK program through the main dialog interface you
- are limited to presetting at most 10 variables, but when using the hAWK() function
- there is no limit on the number of variables you can preset.
-
- -------------
- Advanced topics
- -------------
- “Advanced” is a bit pompous, really—you should have read through the above
- material, tried out some of the supplied programs, and written a couple of
- small programs yourself by this point. That’s all “advanced” means. And the
- last section, “Calling hAWK through Minimal App”, is advanced only in terms
- of understanding what’s going on behind the scenes. The instructions themselves
- are easy to follow.
-
- Other ways of specifying input
- For use when you need to run a hAWK program on several input files with the files
- taken in some specific order, or if you need to hard–code the name of an input file into a
- program, and intend to process the contents of that file before or after all other
- input files.
-
- The way to persuade a hAWK program to treat input files in a specific order is to
- prepare the list of files in the order required, and then modify the program to use
- that list as the names of the input files. This requires building the list, and a small
- addition to the program itself, but it’s not hard to do:
- 1 If possible, use your calling application to select the files for multi–file operations
- (“searching”), and then run the hAWK program “$EchoFullPathNames”. hAWK uses
- full path names to specify files, and this program will produce a list of the full path
- names for the files you selected, in the window called “$tempStdOut”. You can
- painfully construct full path names for your files by hand, but using this hAWK
- program is the simpler way.
- 2 Arrange the full path names into your desired order, and if it’s a list you anticipate
- using again, use “Save As” to save the list away permanently (the contents of
- $tempStdOut don’t survive from one run to the next).
- 3 Copy this block of code to the top of your hAWK program, before all other code:
- BEGIN {while (getline _specific_file_ < ARGV[1] > 0)
- {
- if (length(_specific_file_) > 1 && index(_specific_file_, ":") > 0)
- ARGV[ARGC++] = _specific_file_;
- }
- close(ARGV[1]);
- ARGV[1] = "";
- }#end
- This is executed before the rest of your program, and transparently converts the list of
- input files in the array ARGV[] to the list provided in the one input file “ARGV[1]”
- that is actually supplied when running it. The name of that one orginal input file is
- nulled out, which persuades hAWK to ignore it when input processing starts for real.
- 4 When calling the hAWK program, select your list of files as the only input. If the
- list is in the front window, pick “All of front text”, if it’s in a file use the
- “Select input file…” option to select the file. Then run the program.
-
- If you want to try this out in a test program, read through “$FilesInOrderTest”,
- then run it and pass it a list of files. It will just print the list of files to $tempStdOut,
- confirming that they were read in the correct order.
-
- If you want your program to take input from some specific file first, and then take
- input from whatever files are provided via the setup dialog, then you can pass your
- program the name of the specific file by means of a variable and process the file in a
- BEGIN block. Once again, the only real difficulty is to determine the full path name of
- the file, and this can be done by using $EchoFullPathNames as described above, but
- passing it the single file as input.
- The method in full is:
- 1 Determine the full path name of the specific file, eg
- Hard Disk:Top Folder:Bottom folder:theFile
- 2 Do the processing of this specific file in the BEGIN block of your program, in
- the following way:
- BEGIN { while (getline _x < _specific_file_ > 0)
- {
- -process _x, which contains the lines of _specific_file_
- }
- close(_specific_file_)
- - optional other statements in your BEGIN block
- }
- 3 While setting up your program for a run, use “Set variables” to provide the
- full path name of the specific file in the variable _specific_file_:
- _specific_file_=Hard Disk:Top Folder:Bottom folder:theFile
- and then click “Save settings” if you will be using this file name more than once.
- 4 Run your program, using the setup dialog to take input from wherever is
- appropriate. For an example, see “$WordFrequency”.
-
- If you want to process a special file after all regular input, then use the same
- structure as in point 2 above, but in an END block rather than a BEGIN block.
-
- If the specific file is to be treated in exactly the same way as your other input files,
- but must be processed first, then you can add this BEGIN block to the start of your
- program, again using a fixed full path name passed in the variable “_specific_file_”:
- BEGIN { for (i = ARGC; i >= 2; --i)#Note this creates ARGV[ARGC]
- ARGV[i] = ARGV[i-1];
- ARGV[1] = _specific_file_;
- ARGC++;
- }
-
- Appending a specific input file is even easier, just
- BEGIN { ARGV[ARGC++] = _specific_file_; }
-
- You may find these techniques useful if your program needs a list of “data” before
- running, in other words too much information to fit in the ten variables that you
- can preset before each run.
-
- The built–in variable STDPATH is a path name which specifies the folder that holds,
- among other things, your “Drag_on Modules” folder, which in turn holds your “hAWK
- programs” folder. If your specific input file is in the “hAWK programs” folder for
- example, then you can avoid spelling out the full path name by using “Set variables” to
- set “_specific_file_” to just the name of the file, eg
- _specific_file_=Initial data file
- and then before using _specific_file_ insert the line
- _specific_file_ = STDPATH "Drag_on Modules:hAWK programs:" _specific_file_;
- to build up the full path name for _specific_file_.
-
- The above two methods can be blended together, for example to process an entire
- list of files before dealing with other input files provided by the setup dialog, and
- the files could be processed just as easily in an END block as in a BEGIN block.
-
- Beyond input records
- Let’s face it, not many text files are organized into neat lines or even groups of lines,
- so it is often more appropriate to use hAWK’s automated record retrieval as just the
- first stage of input, building functions on top of it to extract the precise input for the
- job at hand. Four techniques are discussed below: “control–break”, which keeps track
- of current input status by means of variables; “input on demand”, which buries the
- problem of getting the next piece of input in a single function; end–buffered input,
- which, if it reads in too much, temporarily stores the excess input to one side; and a
- rolling buffer, which acts as a multiple–line “window” on the input, the number of
- lines being variable at whim.
-
- The “control–break” style of reading input wrestles with the problem that
- you don’t know you’ve read in too much input until you’ve read in too much—what to
- do then? The general solution is to use variables to keep track of what the current
- “state” is (typically the states are “more input wanted” and “oops, a bit too much”).
- This leads to control constructs which seem to put the cart before the horse, in that
- one first takes action based on the value of a variable, and only later in the program is
- the variable set, which requires a bit of planning.
-
- As a simple illustration,
- $1 != lastFieldOne { print "New field one is", $1
- lastFieldOne = $1
- }
- which has been seen before, prints the contents of the first field on the input line
- whenever it changes. The variable “lastFieldOne” is used to control output.
-
- The general approach with control–breaks is, in pseudo–language:
- if (toofar)
- scramble to catch up;
- else
- proceed normally;
- set the toofar variable;
-
- At this point, you might want to read through an example of control–breaks: $XRef
- deals with the problem of skipping over comments and strings in C code, even though
- hAWK reads the input one line at a time and comments and strings can be anywhere.
-
- “Input on demand” is a way of using “getline” in combination with formatting functions
- to retrieve input sequentially as though the entire file were one large record, without
- cluttering up the top level of your program. The details of translating from line
- format to your required format are buried in a function that keeps track of the
- relation between the two; once this function is written, the top level of your
- program can call this function without worrying about the translation details.
-
- For a full example, see $Print_MENU_Resource, which deals with the problem of
- reading and formatting a MENU resource, as retrieved by Read Resource.
-
- End–buffered input relies on retrieving input lines through two functions,
- “GetNextLine” and “UngetLine”, and a variable “inBuffer” which keeps track of
- whether a line was “ungot”. With this approach there is no need to “scramble to catch
- up”, since the extra input is stored to one side until the next “GetNextLine” call. The
- conditions under which a line is to be stored due to going too far depend on the context
- (ie it’s up to you), but the general approach is
- function DoTheJob(file, line)
- { getError = 1;
- while (GetNextLine(file, line) > 0)
- {
- if (you decide that’s too far)
- UngetLine(line);
- else
- process line;
- }
- }
- and the functions that get and unget are
- function GetNextLine(file, line)
- {
- if (getError <= 0)
- return getError;
- if (inBuffer)
- {
- line = _buffer;
- inBuffer = 0;
- return 1
- }
- return getError = (getline line < file)
- }
-
- function UngetLine(line)
- {
- _buffer = line
- inBuffer = 1
- }
- where “file” is the full path name of the file to take input from.
-
- For an example using end–buffered input, see “The AWK programming language” by
- Aho, Kernighan, and Weinberger, page 105. You’ll find this approach useful if you
- have small databases to analyse.
-
- The rolling–buffer approach to input adds lines of input to the end of a variable, and
- removes them from the front. The variable in question can contain more or fewer lines
- according to the needs of the moment, though there should be an upper limit on the number
- of lines. In pseudo–language, the general approach to rolling lines of input through a
- buffer variable is:
- while (getline x > 0)
- {
- multi = multi "\n" x;#add current line x to end of buffer variable “multi”
- process multi however you like;
- while (too many lines in multi)
- {
- j = index(multi, "\n");#position of first newline in multi
- #first line in multi, if needed, = substr(multi, 1, j);
- multi = substr(multi, j + 1);#trim first line from multi
- }
- }
- The “while (getline x > 0)” loop stops normally when the end of the current input file
- is reached (abnormal, as in file missing, is possible but unlikely). You can count
- the number of lines in multi at any time with
- numMultiLines = gsub("\n", "\n", multi)
- which replaces newlines with newlines, and relies on gsub returning the number of
- replacements—awkward, but it works. Arbitrary chunks of text can be removed from
- the front of multi if desired, rather than removing a line at a time.
-
- For a full and very useful example see “$MFS_SuperLister” which is capable
- of matching a regular expression or string of text even if it spans a variable number
- of lines. “$MFS_SuperReplace” is similar, doing multi–file search and replace instead
- of just listing matches.
-
- Calling hAWK through Minimal App
- Minimal App does not support passing text or file lists to hAWK, or showing results
- after a run, but these things can be done with a bit of extra work on your part. If
- you’re not interested in using Minimal App or some other application that provides
- minimal support for hAWK as your main hAWK–caller, you can skip this section.
-
- Since Minimal App does not support text documents at all, you’ll need an editor of some
- sort in order to do these things, and the assumption here will be that you’re running
- under MultiFinder (or system 7), using your favourite editor. You could also use a
- Desk Accessory editor together with Minimal App, a practical alternative if you intend
- to do nothing but run hAWK programs for an extended period. However, the focus here is
- on running hAWK programs while using an editor that does not support calling hAWK,
- by using Minimal App, MultiFinder, and a few workarounds.
-
- Ideally, an editor designed to run under MultiFinder should offer you protection against
- creating multiple versions of a file, and provide some automatic means of ensuring that
- you are always viewing the most up-to-date version of a file. An adequate solution in a
- single–user context would be for all editors to cooperate by offering the options of
- automatically saving all open files when switching out, and refreshing all open files
- from disk (if necessary) when switching back. At present almost all Macintosh editors
- are, in this sense, MultiFinder–unaware. So unless you know otherwise, it’s up to you
- to ensure that you keep the screen and disk versions of a file synchronised by Saving and
- Reverting with your editor at the appropriate times, as described below. Nuisance, what?
-
- First, let’s look at passing all or part of a file to hAWK, and viewing the result of a run.
- Since hAWK provides as your input option just the ability to select a single file when
- called through Minimal App, the simplest approach is to use a single common file as the
- input file for all programs which expect input from all or part of a file, and use the
- setup dialog to set (and save) that file as the input file. Oddly, the simplest file to pick
- is stdout ($tempStdOut, in the same folder that holds Minimal App). There is no
- conflict between passing stdout to a program as input, and then writing to stdout,
- because just before your program is run hAWK will rename your stdout file to
- “$tempOutAsInput” and then pass that name to your program. The “old” version
- of stdout will be used as input, and the “new” version will hold whatever was written
- to stdout during the run. With stdout as your common input file, the approach to use
- for passing all or part of a file from your editor to a hAWK program is:
- • Open the stdout file (ie $tempStdOut) in your editor, and leave it open (you can create
- this file by running $EnumSwitch with no input, or create it with your editor - it goes
- in the same folder as Minimal App and the Drag_on Modules folder, at the same level)
- • Copy/Paste the input text over all of stdout, and Save it.
- • Switch to Minimal App, call up hAWK, and select your program.
- • If it’s the very first run, use the “Select input file...” command to select $tempStdOut
- as the specific input file, and then Save Settings so the program will remember this.
- • Run the program.
- •Return to your editor, type a character in the stdout window, and Revert - you’ll see
- what was written to stdout by the program. To view any other created or altered files,
- you’ll need to open them with your editor.
-
- Here’s an example run, to get you going. The example program is $EnumSwitch, which
- takes a list of enum constants and generates a “switch” statement based on them. You
- should be viewing this file with your editor, and also have Minimal App up and running
- in a separate partition under MultiFinder or system 7 at some point.
- • Copy the indented line just below with your editor, and Save it as the entire contents
- of $tempStdOut, in the same folder where you’re keeping Minimal App.
- {first, second, third, fourth, twilightZone = -99}
- • Leave the $tempStdOut file open.
- • Switch to Minimal App and select hAWK; use the “Main program:” popup menu
- to select “$EnumSwitch” as the program to run.
- • Use the “Select input file...” option under the “Take input from:” popup menu
- to select your “$tempStdOut” file as the input file to use with $EnumSwitch.
- • Click the “Save settings” button so that $EnumSwitch will remember which
- input file to use for subsequent runs.
- • Click the Run button, and wait until the highlighting goes away from the main
- menu bar, signalling that the program is done.
- • Return to your editor, type a character in the $tempStdOut window, and pick
- Revert; you’ll see the results of $EnumSwitch on the line of enums you started with.
-
- Some programs, such as $MFS_SuperReplace, naturally work with a list of files rather
- than just a single file. Here the simplest approach is to pass to your program a single
- input file which contains a list of the actual files to use as input. Again, it is best to
- settle on a single name for the file which contains the file list, and use the setup dialog
- to set the program to take input from this file. Here the name doesn’t matter, and
- something like “Standard File List” would do ($tempStdOut and other standard files
- are best avoided here). It then remains to; create the list of files, and internally alter
- the program(s) so that they will properly interpret the file list.
-
- First, the list of files: it should be a list of full path names, one file per line. You can
- generate the full path name for any single file by running “$EchoFullPathNames” with
- the file in question as input. Given that path, you can then generate full path names for
- other files in the same folder with a bit of copying and replacing of the file name,
- leaving the path the same. Some editors can generate full path names for files, which is
- an easier approach. If you have no easy way of generating full path names you might
- want to create a “master list” of full path names, and selectively copy the needed names
- to your “Standard File List” file before running a hAWK program.
-
- Each program that you want to take input from your file list needs a small addition
- at the beginning. Open the program, and copy the following BEGIN block into the
- program, as the very first block of code in the file:
- BEGIN {while (getline _specific_file_ < ARGV[1] > 0)
- {
- if (length(_specific_file_) > 1 && index(_specific_file_, ":") > 0)
- ARGV[ARGC++] = _specific_file_;
- }
- close(ARGV[1]);
- ARGV[1] = "";
- }#end addition
- This persuades the program to take input from the list of files, rather than treating
- the list of files as the input. This may look familiar, as it’s the same alteration
- described in the first section of this chapter for persuading a program to take input
- from a list of files in specific order.
-
- And finally, to run a hAWK program on a list of files:
- • Your “Standard File List” file should contain the exact list of files that you want to
- use as input files, as full path names. Remember to Save it if you change it, before
- running your program.
- • Switch to Minimal App, call up hAWK, and select the program to be run.
- • If it’s the very first run, use the “Select input file...” command to select your file
- containing the file list as the specific input file, and then Save Settings so the program
- will remember this.
- • Run the program.
- • Back to your editor, and Revert stdout as described above if the program writes to
- stdout.
-
-
- ---------------------------
- Calling hAWK from your application
- ---------------------------
- What and how
- Your application, that is, any application for which you have the source code, should
- be a THINK C project. If your application is written for some other C compiler, you
- should be able to modify the supplied source without too much anguish. If your application
- is not written in C you will still be able to call hAWK if your language supports calling
- C–style functions. However, you will have to provide your own equivalent for the
- file “Call_Resource.c”, not a trivial undertaking. The following discussion
- will assume that your application is built from a THINK C project.
-
- Drag_on Modules, of which hAWK is an example, are CODE resources. To call a
- Drag_on Module, you load the first segment of its code (CODE 0), set up a pointer
- to an interface structure which contains file names and “callback” functions, and
- then jump to the starting address of the CODE resource as though it were a C–style
- function. Your application will load a list of Drag_on Modules into a menu for selection
- by the user.
-
- Modifying your application to call hAWK and other Drag_on Modules divides into two
- stages: adding the source file “Call_Resource.c” to your project and inserting two
- function calls in your source; and then, when the basic version has checked out,
- deciding what level of support to supply for callback and result–showing functions.
-
- Drag_on Modules can be called by virtually any application, but considerable enhancement
- is possible if your application supports text windows and files. For example, hAWK can
- take input from the front text window of your application, and relies on your application
- to show the text file stdout if the user requests it. If your application doesn’t support text
- windows and files it can still call hAWK, but some input options and the showing of
- result files will be absent.
-
-
- Getting started
- To get going, add the source file “Call_Resource.c”, in the “code to call Drag_ons”
- folder on the same disk where you found this manual, to your application project. You
- will also need to add the standard ANSI library if it’s not already in your project (this
- won’t add much to the size of your built application). Compile it, and run it as well to
- check for linkage errors. If your application lacks some of the toolbox headers that are
- normally included in the MacHeaders precompiled standard header then you may have to
- explicitly #include them in the file “Call_Resource.c”.
-
- Add two calls in your code
- First, decide which of your application menus to use for showing the Drag_on Modules.
- Then follow the instructions at the top of “Call_Resource.c” in points 2 and 3 which
- describe how and where to place the two calls to functions in “Call_Resource.c”.
- InitCallResources() will load a list of Drag_on Modules into your chosen menu, and
- CallResource() will call a Drag_on Module when it is selected from your menu.
-
- For an example of adding “Call_Resource.c” to an application and inserting the two
- required function calls, see the source code and THINK C project for “Minimal App”
- (the two calls are in “minimalApp.c”, and the copy of “Call_Resource.c” in the
- “Minimal App” folder is identical to the original in “code to call Drag_ons”).
-
- A minimal version
- Verify that line 98 or so of “Call_Resource.c” reads
- #define SUPPORT_LEVEL MINIMAL
- Bring your THINK C project up to date, and build a new version of your application. In
- order for hAWK and company to show up in your menu, the folder “Drag_on Modules”
- (with hAWK inside) needs to be in the same folder as your application, at the same
- level, so do this first before starting up your application.
-
- Start your application, and you should see hAWK listed under the menu you have chosen
- to show Drag_on Modules. Select hAWK, and the setup dialog should appear; however,
- input options under the “Take input from:” popup will be limited to just the
- “Specific input file...” option. Select the program “$EchoFileNames”, and then use
- the “Take input from:” option to select any TEXT file for it to use as input. Click
- Run, wait about 2 seconds or until the mouse is back under your control, and then
- check however you like that the file “$tempStdOut” contains the name of the file you
- selected as input for “$EchoFileNames”.
-
- Callbacks, and showing results
- Once you have the above basic version up and running, you should read through the
- “Call_Resource.c” file and decide how much support to provide for the tasks of
- offering input options and showing the “$tempStdOut” result file. An important
- and easily–supported alert function (OKStopAlert) and a function for changing the
- cursor to a watch round out the list of functions that enhance hAWK’s
- performance (or any Drag_on Module, for that matter). The more functions
- you support, the more useful hAWK will be to your users.
-
- If you decide to support any of these optional capabilities, also change the
- #define SUPPORT_LEVEL MINIMAL
- statement in “Call_Resource.c” to reflect the level of support you are providing
- (instructions for this are in the file, around line 86).
-
- Finally, around line 131 in “Call_Resource.c” you will see the statement
- static char callerName[] = "MyApp";
- Change the name to the name of your application, and you’re done.
-
- Any enhancements or modifications you make are your own business. However, hAWK
- and most of the source code for hAWK is copyright by the Free Software
- Foundation—you can distribute hAWK and the source code for it, provided you follow the
- restrictions contained in the file “COPYING hAWK”, on the same disk where you found this
- manual. Where Dynabyte might be construed as owning the copyright, all rights are
- waived except the right to copyright, this latter only to preserve the former. Catch 23.
-
- -------------
- Modifying hAWK
- -------------
- Introduction
- Building hAWK used to be a nontrivial undertaking. Now, just build the "hAWK.µ" CodeWarrior
- project, merging it into an existing copy of "hAWK" when the merge dialog appears.
- At present, CodeWarrior ANSI libraries suffer from the problem that they allocate
- a 65K pointer and never let go of it, but this is worked around by throwing hAWK
- into its own heap zone when calling it, then dumping the whole heap when done.
-
- Warning: the original PC code that hAWK is based on is old, very old,
- and the modifications to make it Macintosh were rather brutally done. If you plan
- major changes to hAWK, expect some grief along the way.
-
- END hAWK MANUAL
- (OOPS forgot to provide the Reverse Polish expression interpreter - what a tragedy...)
-
-
- -------------------
- Active index
- -------------------
- This index lists line numbers for topics, suitable for use with editors that
- allow you to jump to or “Go to” a selected line number
-
- | in reg. exp. 1669
- || in patterns 1846
- ~ (matching operator) 1589
- ~! (not match operator) 1628
- π 2089
- \ in reg. exp. 1669
- \1...\9 1695
- \< 1681
- \> 1682
- \B 1684
- \b 1683
- \n 1686
- \t 1685
- \W 1680
- \w 1679
- ! in patterns 1846
- $about the supplied programs 828
- $tempStdIn 553 729
- $tempStdErr 729
- $tempStdOut 681 729 741
- $tempStdOut is temporary 763
- $ to start program name 514
- $ in reg. exp. 1669
- $EnumSwitch 390 1030
- $FilesInOrderTest 2760
- $MFS_SuperLister 2747
- $PatternTester 1915
- $sample programs see 828
- && in patterns 1846
- ( ) in reg. exp. 1669
- * in reg. exp. 1669
- + in reg. exp. 1669
- . in reg. exp. 1669
- >, >> (redirection) 2599
- ? : in patterns 1862
- ? in reg. exp. 1669
- [ ] in reg. exp. 1669
- ^ in reg. exp. 1669
- actions 1936
- All of front text 550
- ANSI a4 3323
- ARGC 1146 1314
- ARGV[] 1136 1314
- arrays 1428
- atan2() 2071
- automatic conversion 1394
- auto version incrementing 2650
- AWK and GAWK 257
- backslash to break long lines 1087
- beep() 2191
- BEGIN (pattern) 1545
- break 2340
- breaking lines 1079
- built–in string and file functions 2098
- built–in variables 1314
- built–in numeric functions 2071
- Call_Resource.c 3231
- calling hAWK from your application 3199
- cancelling a run 724
- close() 2671
- command line 1136
- comments in the source 1002 1104
- comparison operators in patterns 1575
- compound patterns 1845
- concurrent and immediate modes 449
- continue 2344
- control–break 2978
- control-flow statements 2300
- concatenation 1999
- constants 1236
- conversion, numbers and strings 1394
- copy() 2193
- cos() 2071
- delete 1461
- do-while statement 2322
- empty statements 2433
- END (pattern) 1545
- end–buffered input 3014
- example hAWK programs 828
- exists 2203
- exit 2353
- exp() 2071
- expression operators 2013
- expressions (as patterns) 1565
- expressions in actions 1956
- fields ($1 $2 etc) 1017 1269
- fdate() 2205
- FILENAME 1314
- files, closing 2671
- FNR 1314
- for (var in array) 1460 2337
- for (;;) statement 2327
- Front text selection 549
- FS (field separator) 1281 1314 2690
- fsize() 2210
- full path name, splitting 1532 2645
- full path names 1211 1355 1534 2615
- functions, user–defined 2440
- function, local variables 1363
- GAWK and AWK 249
- getclip() 2211
- getline 2720
- grouping and breaking lines 1079
- gsub() 2098
- hAWK programs (folder) 514
- hAWK, calling from your application 3199
- hAWK, installing 180
- hAWK() function 2781
- if statement 2313
- IGNORECASE 1314
- immediate and concurrent modes 449
- int() 2071
- in (operator) 1448
- index() 2098
- input files, in order 2875
- input on demand 3004
- input selection for a program 525
- installing hAWK 180
- length() 2098
- library files 655
- lines, breaking and grouping 1079
- list() 2223
- local variables 1363 2473
- log() 2071
- lookup() 2130
- Main program: (popup) 514
- match() 2098
- metacharacters 1669
- MFS selected files 559
- Minimal App 3247
- minimalApp.c 3248
- missing pattern 1526
- modifying hAWK 3294
- multiline records 2707
- name conventions for programs 514
- nested() 2228
- next 2349
- NF 1314
- no input, specifying 579
- NR 1314
- null string 1256
- number versus string 1394
- numeric functions, built–in 2071
- octal in reg. exp. 1702
- OFMT 1314
- OFS 1314
- tolower() 2098
- operators, table of 2022
- ordering input files 2875
- ORS 1314
- output into files 2599
- patterns and actions 1516
- path names 1211 1355 1534 2615
- patterns 1514
- pattern, missing 1526
- patterns, summary 1894
- pipes (none) 276
- presetting variables 587
- print (preview of) 1979
- print (details) 2500
- printf statement 2525
- printing this manual 221
- program name conventions 514
- program, input selection 525
- prompt() 2163
- punctuation, inside / / 1612
- punctuation, inside quotes 1619
- putclip() 2217
- rand() 2071
- range patterns 1870
- records ($0) 1017 1269
- redirecting output 2599
- references 233
- regular expressions 1633
- regular expressions, examples 1741
- remove() 2236
- rename() 2240
- return 2449
- RLENGTH 1314
- rolling buffer for input 3055
- RS (record separator) 1314 1274
- RSTART 1314
- Run button 441 716
- RUNERR 1314
- sample hAWK programs 828
- Save settings (button) 700
- setup dialog 419
- setup, saving 700
- setting variables before a run 587 1214 1383
- Selecting input for a program 525
- Select all of stdout (checkbox) 695
- Select input file… 571
- Select unlisted program… 517
- Show stdout (checkbox) 688
- sin() 2071
- sort() 2145
- SortLibrary, sample library 674
- specific order for input files 2875
- split() 2098
- split full path name 1532 2645
- sprintf() (see also printf) 2538 2098
- sqrt() 2071
- srand() 2071
- STDPATH 1314 2621 2953
- standard input and output 729
- statement grouping with {} 2310
- stderr 2632
- stdout 2632
- string functions, built–in 2098
- string-matching patterns 1589
- string versus number 1394
- sub() 2098
- substr() 2098
- SUBSEP 1314 1440
- summary of patterns 1894
- supplied hAWK programs 828
- system 278
- Take input from: (popup) 549
- TIME builtin variable 1361
- time() 2159
- toupper() 2098
- uninitialized variables 1256
- unix a4 library 3323
- user-defined functions 2440
- variables 1236
- variable, setting before a run 587 1214 1383
- version incrementing 2650 (see also $TabsToSpaces)
- while statement 2316
-